Regex help with catching the right end point?

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
Eric!
DevNet Resident
Posts: 1146
Joined: Sun Jun 14, 2009 3:13 pm

Regex help with catching the right end point?

Post by Eric! »

I'm terrible at regex and I must be missing something fundamental here. I'm trying just to get the "I ONLY WANT THIS STRING" in the following example:

Code: Select all

$html='<li>CAPTURE: I ONLY WANT THIS STRING</li><li>NOT THIS ONE</li><li>OR THIS ONE</li></ul>';
$pattern="/capture:\s(.*)(\<\/LI>)/i";
preg_match($pattern, $html, $results);

echo htmlentities($results[1])
But I keep getting multiple parts:
[text]Output:
I ONLY WANT THIS STRING</li><li>NOT THIS ONE</li><li>OR THIS ONE
[/text]

I've tried a variety of patterns, but the closing tag is alluding me as a stopping point for the pattern. Can someone clue me in?
Eric!
DevNet Resident
Posts: 1146
Joined: Sun Jun 14, 2009 3:13 pm

Re: Regex help with catching the right end point?

Post by Eric! »

Ok, I think this is because it is greedy by default. If I make the pattern

Code: Select all

$pattern="/capture:\s(.*?)(\<\/LI>)/i";
It works. I'm not sure how it determines where to end in the previous case. I pulled that test string out of a large HTML file that had many more </LI> tags for the regex to choose from, but it stopped after the third....
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: Regex help with catching the right end point?

Post by Jonah Bron »

Eric! wrote:Ok, I think this is because it is greedy by default. If I make the pattern

...

It works. I'm not sure how it determines where to end in the previous case.
Correct, it will continue until it reaches the last match of </li> it finds.
Eric! wrote: I pulled that test string out of a large HTML file that had many more </LI> tags for the regex to choose from, but it stopped after the third....
There are likely new lines farther into the HTML, and the dot character doesn't match new lines. So it will stop at the last match of </li> before reaching a new line.
Eric!
DevNet Resident
Posts: 1146
Joined: Sun Jun 14, 2009 3:13 pm

Re: Regex help with catching the right end point?

Post by Eric! »

Thanks for clarifying that. Sometimes I feel like I'm taking crazy pills with regex stuff.
Post Reply