extracting numbers from file title and references

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
DrPL
Forum Commoner
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

extracting numbers from file title and references

Post by DrPL »

Hi,
I am not too sure exactly where to put this, as it is mainly a regex question but slightly crosses over into perl syntax.
Hopefully all will become clear.

At the moment, I have a directory full of files called
chapter1.txt, chapter2.txt and so on. Within each of these files are references encased in square brackets which I am trying
to link to external files. The format of the link is c1f1.html for chapter 1 reference 1, c3f5.html for chapter 3 reference 5.

So, in chapter 1,
[1]
becomes <a href="c1f1.html">[1]</a> and so on.

I have come up with a bit of code below

Code: Select all

opendir (DIR, "/home/paul/work/") or die "$!";
my @files = grep {/chapter*txt/}  readdir DIR;
foreach my $file (@files) 
{
   open(FH,"/home/paul/work/$file") or die "$!";

   my ($chapnumber) = ($file =~/chapter(\d+).txt/);
	
   while (<FH>)
   { 
    	$dummyvar = ~s/\[(\d+\)]/<a href=\"c.$chapnumber.f.$1\.html\">\[$1\]<\/a>/g; 
   }
   close(FH);
}
- but it falls over when it gets to the regex expression containing the angle brackets (the line starting $dumyvar = ...)
As far as I can see I'm extracting the chapter number from the title correctly, and the regex for replacing within
the file looks OK.
Can someone please suggest what might be wrong?

Many thanks

Paul
abareplace
Forum Newbie
Posts: 9
Joined: Fri Jan 06, 2012 1:43 am

Re: extracting numbers from file title and references

Post by abareplace »

You should not escape ) in the regular expression:

Code: Select all

\[(\d+)]
If I remember Perl syntax correctly, the dot before html and the brackets [] in replacement should NOT be escaped as well.
DrPL
Forum Commoner
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

Re: extracting numbers from file title and references

Post by DrPL »

Looks like I got the backslash in the wrong place. It should have been

Code: Select all


$dummyvar = ~s/\[(\d+)\]/<a href=\"c.$chapnumber.f.$1\.html\">\[$1\]<\/a>/g; 

DrPL
Forum Commoner
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

Re: extracting numbers from file title and references

Post by DrPL »

abareplace wrote:You should not escape ) in the regular expression:

Code: Select all

\[(\d+)]
If I remember Perl syntax correctly, the dot before html and the brackets [] in replacement should NOT be escaped as well.
I think I need the escape, otherwise the dot would be treated as a concat operator (?). I need it to be a punctuation delimeter, as in "blahblah.html"
abareplace
Forum Newbie
Posts: 9
Joined: Fri Jan 06, 2012 1:43 am

Re: extracting numbers from file title and references

Post by abareplace »

It's inside the string, so there is no concatenation operator. The variables are interpolated. You need the following code:

Code: Select all

~s/\[(\d+)]/<a href="c${chapnumber}f$1.html">[$1]<\/a>/g;
See http://ideone.com/moBAf
DrPL
Forum Commoner
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

Re: extracting numbers from file title and references

Post by DrPL »

abareplace wrote:It's inside the string, so there is no concatenation operator. The variables are interpolated. You need the following code:

Code: Select all

~s/\[(\d+)]/<a href="c${chapnumber}f$1.html">[$1]<\/a>/g;
See http://ideone.com/moBAf
Thanks, very interesting. Is this why in your code the ] isn't escaped? I'm also a bit confused about why "chapnumber" is in curly brackets to separate it from the "$", but the grouped $1 isn't.
DrPL
Forum Commoner
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

Re: extracting numbers from file title and references

Post by DrPL »

I made a mistake; rather than the link being of the form <a href="blah.html"> it should have been <a href="#blah">.
I've modified my code, and included a few print statements to confirm that the chapter numbers are being stripped out;
and they are, but the replacement regex is still not working.

Code: Select all


#!/usr/bin/perl

@files = <*>;

foreach $file (@files)
{
   open(FH,"/home/paul/kp/$file") or die "cannot open file";

   print $file . "\n";

   my ($chapnumber) = ($file =~/chapter(\d+).txt/);

        print $chapnumber . "\n";
	
   while (<FH>)
   { 
	$dummyvar = ~s/\[(\d+)\]/<a href="#c${chapnumber}f$1">[$1]<\/a>/g;

   }
   close(FH);
}
closedir(DIR);

abareplace
Forum Newbie
Posts: 9
Joined: Fri Jan 06, 2012 1:43 am

Re: extracting numbers from file title and references

Post by abareplace »

${chapnumber} is in curly brackets to separate it from f. If you don't put the brackets, it would be $chapnumberf.

AFAIK, $dummyvar is not needed.

The replacement regex is working (as you can see from the program at ideone.com), but you don't write the result anywhere. The file is opened in read-only mode, you are replacing it into $_, but don't print the result.
Post Reply