Hi All,
I have an input string that is multi-line (i.e. I have slurped in an entire file) -- I am trying to match the data within some some xml-like tags , for example:
$input =~ /<header>(.+)<\/header>/
...so I am using perl to do the above expression, expecting that everything between the "header" tags will be returned in $1...my understanding according to perl is that by default this should work because we are not in multiline mode...however, the above does not work....but when I remove all the carriage returns from the input string, it works (which suggests it is in multi-line mode by default)...what am I doing wrong?
thanks in advance!
regex not working as expected
Moderator: General Moderators
Re: regex not working as expected
Oh, I think I figured it out maybe -- I didn't notice the /s requirement since "." does not match a carriage return (strange).
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: regex not working as expected
That is not what multi-line is about. The multi line option will cause the ^ and % anchors to match each start- and end of a line in the input string instead of the start- and end of the entire string.
What you need is to enable the dot-all option: s.
Demo:
Note that I added a question mark after your DOT-STAR, to understand why this is generally a good idea, see: http://www.regular-expressions.info/repeat.html specifically the paragraph "Watch Out for The Greediness!".
But if you're parsing (X)HTML or XML files, I recommend using a xml/html parser instead of trying to do this with regex: regex is a poor html parser.
What you need is to enable the dot-all option: s.
Demo:
Code: Select all
#!/usr/bin/perl -w
my $s = "...<header>ab\ncd</header>...";
$s =~ /<header>(.*?)<\/header>/s;
print "$1";But if you're parsing (X)HTML or XML files, I recommend using a xml/html parser instead of trying to do this with regex: regex is a poor html parser.
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: regex not working as expected
Not strange at all. In almost all regex engines, the DOT by default does not match new line characters.andersod2 wrote:Oh, I think I figured it out maybe -- I didn't notice the /s requirement since "." does not match a carriage return (strange).