Page 1 of 1

extracting numbers from file title and references

Posted: Sun Feb 26, 2012 5:00 am
by DrPL
Hi,
I am not too sure exactly where to put this, as it is mainly a regex question but slightly crosses over into perl syntax.
Hopefully all will become clear.

At the moment, I have a directory full of files called
chapter1.txt, chapter2.txt and so on. Within each of these files are references encased in square brackets which I am trying
to link to external files. The format of the link is c1f1.html for chapter 1 reference 1, c3f5.html for chapter 3 reference 5.

So, in chapter 1,
[1]
becomes <a href="c1f1.html">[1]</a> and so on.

I have come up with a bit of code below

Code: Select all

opendir (DIR, "/home/paul/work/") or die "$!";
my @files = grep {/chapter*txt/}  readdir DIR;
foreach my $file (@files) 
{
   open(FH,"/home/paul/work/$file") or die "$!";

   my ($chapnumber) = ($file =~/chapter(\d+).txt/);
	
   while (<FH>)
   { 
    	$dummyvar = ~s/\[(\d+\)]/<a href=\"c.$chapnumber.f.$1\.html\">\[$1\]<\/a>/g; 
   }
   close(FH);
}
- but it falls over when it gets to the regex expression containing the angle brackets (the line starting $dumyvar = ...)
As far as I can see I'm extracting the chapter number from the title correctly, and the regex for replacing within
the file looks OK.
Can someone please suggest what might be wrong?

Many thanks

Paul

Re: extracting numbers from file title and references

Posted: Sun Feb 26, 2012 5:40 am
by abareplace
You should not escape ) in the regular expression:

Code: Select all

\[(\d+)]
If I remember Perl syntax correctly, the dot before html and the brackets [] in replacement should NOT be escaped as well.

Re: extracting numbers from file title and references

Posted: Sun Feb 26, 2012 6:02 am
by DrPL
Looks like I got the backslash in the wrong place. It should have been

Code: Select all


$dummyvar = ~s/\[(\d+)\]/<a href=\"c.$chapnumber.f.$1\.html\">\[$1\]<\/a>/g; 


Re: extracting numbers from file title and references

Posted: Sun Feb 26, 2012 6:04 am
by DrPL
abareplace wrote:You should not escape ) in the regular expression:

Code: Select all

\[(\d+)]
If I remember Perl syntax correctly, the dot before html and the brackets [] in replacement should NOT be escaped as well.
I think I need the escape, otherwise the dot would be treated as a concat operator (?). I need it to be a punctuation delimeter, as in "blahblah.html"

Re: extracting numbers from file title and references

Posted: Sun Feb 26, 2012 8:23 am
by abareplace
It's inside the string, so there is no concatenation operator. The variables are interpolated. You need the following code:

Code: Select all

~s/\[(\d+)]/<a href="c${chapnumber}f$1.html">[$1]<\/a>/g;
See http://ideone.com/moBAf

Re: extracting numbers from file title and references

Posted: Sun Feb 26, 2012 9:20 am
by DrPL
abareplace wrote:It's inside the string, so there is no concatenation operator. The variables are interpolated. You need the following code:

Code: Select all

~s/\[(\d+)]/<a href="c${chapnumber}f$1.html">[$1]<\/a>/g;
See http://ideone.com/moBAf
Thanks, very interesting. Is this why in your code the ] isn't escaped? I'm also a bit confused about why "chapnumber" is in curly brackets to separate it from the "$", but the grouped $1 isn't.

Re: extracting numbers from file title and references

Posted: Sun Feb 26, 2012 11:40 am
by DrPL
I made a mistake; rather than the link being of the form <a href="blah.html"> it should have been <a href="#blah">.
I've modified my code, and included a few print statements to confirm that the chapter numbers are being stripped out;
and they are, but the replacement regex is still not working.

Code: Select all


#!/usr/bin/perl

@files = <*>;

foreach $file (@files)
{
   open(FH,"/home/paul/kp/$file") or die "cannot open file";

   print $file . "\n";

   my ($chapnumber) = ($file =~/chapter(\d+).txt/);

        print $chapnumber . "\n";
	
   while (<FH>)
   { 
	$dummyvar = ~s/\[(\d+)\]/<a href="#c${chapnumber}f$1">[$1]<\/a>/g;

   }
   close(FH);
}
closedir(DIR);


Re: extracting numbers from file title and references

Posted: Sun Feb 26, 2012 6:25 pm
by abareplace
${chapnumber} is in curly brackets to separate it from f. If you don't put the brackets, it would be $chapnumberf.

AFAIK, $dummyvar is not needed.

The replacement regex is working (as you can see from the program at ideone.com), but you don't write the result anywhere. The file is opened in read-only mode, you are replacing it into $_, but don't print the result.