Page 1 of 1

Regex help with catching the right end point?

Posted: Wed May 18, 2011 2:01 pm
by Eric!
I'm terrible at regex and I must be missing something fundamental here. I'm trying just to get the "I ONLY WANT THIS STRING" in the following example:

Code: Select all

$html='<li>CAPTURE: I ONLY WANT THIS STRING</li><li>NOT THIS ONE</li><li>OR THIS ONE</li></ul>';
$pattern="/capture:\s(.*)(\<\/LI>)/i";
preg_match($pattern, $html, $results);

echo htmlentities($results[1])
But I keep getting multiple parts:
[text]Output:
I ONLY WANT THIS STRING</li><li>NOT THIS ONE</li><li>OR THIS ONE
[/text]

I've tried a variety of patterns, but the closing tag is alluding me as a stopping point for the pattern. Can someone clue me in?

Re: Regex help with catching the right end point?

Posted: Wed May 18, 2011 8:26 pm
by Eric!
Ok, I think this is because it is greedy by default. If I make the pattern

Code: Select all

$pattern="/capture:\s(.*?)(\<\/LI>)/i";
It works. I'm not sure how it determines where to end in the previous case. I pulled that test string out of a large HTML file that had many more </LI> tags for the regex to choose from, but it stopped after the third....

Re: Regex help with catching the right end point?

Posted: Sun May 22, 2011 4:39 pm
by Jonah Bron
Eric! wrote:Ok, I think this is because it is greedy by default. If I make the pattern

...

It works. I'm not sure how it determines where to end in the previous case.
Correct, it will continue until it reaches the last match of </li> it finds.
Eric! wrote: I pulled that test string out of a large HTML file that had many more </LI> tags for the regex to choose from, but it stopped after the third....
There are likely new lines farther into the HTML, and the dot character doesn't match new lines. So it will stop at the last match of </li> before reaching a new line.

Re: Regex help with catching the right end point?

Posted: Thu Jun 02, 2011 6:01 pm
by Eric!
Thanks for clarifying that. Sometimes I feel like I'm taking crazy pills with regex stuff.