Page 1 of 1

matched based on element id

Posted: Sat May 30, 2009 8:16 pm
by SidewinderX
Since element id's should be unique, I am trying to write an expression to match content between a pair of html tags only given the elements' id. Here is what I have come up with (it works):

Code: Select all

 
$html = '<div class="bar"><span class="foo" id="foo">bat</span></div>';
$id = "foo";
preg_match("#<([a-z]+)[^<>]*id=\"$id\"[^>]*>(.*)</\\1>#is", $html, $matches);
//matches bat
 
Does anyone have any tips for optimizing?

Re: matched based on element id

Posted: Sun May 31, 2009 5:44 am
by GeertDD
Before anybody else says it, a HTML parser would probably a better choice to pull off this job.

That said, here are some remarks about your regex. Before you start to optimize it, you should double check whether it matches all the right things.

<([a-z]+) does not account for elements with a number in it, like h1.

[^<>]*id= does not account for attributes that end in "id" other than "id" itself. For example: <h1 otherid="boo">.

\"$id\" does not take single quotes into account.

Finally, a general tip for optimizing is to use possessive quantifiers where possible.