Any questions involving matching text strings to patterns - the pattern is called a "regular expression."
Moderator: General Moderators
SidewinderX
Forum Contributor
Posts: 407 Joined: Fri Jul 16, 2004 9:04 pm
Location: NY
Post
by SidewinderX » Sat May 30, 2009 8:16 pm
Since element id's should be unique, I am trying to write an expression to match content between a pair of html tags only given the elements' id. Here is what I have come up with (it works):
Code: Select all
$html = '<div class="bar"><span class="foo" id="foo">bat</span></div>';
$id = "foo";
preg_match("#<([a-z]+)[^<>]*id=\"$id\"[^>]*>(.*)</\\1>#is", $html, $matches);
//matches bat
Does anyone have any tips for optimizing?
GeertDD
Forum Contributor
Posts: 274 Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium
Post
by GeertDD » Sun May 31, 2009 5:44 am
Before anybody else says it, a HTML parser would probably a better choice to pull off this job.
That said, here are some remarks about your regex. Before you start to optimize it, you should double check whether it matches all the right things.
<([a-z]+) does not account for elements with a number in it, like h1.
[^<>]*id= does not account for attributes that end in "id" other than "id" itself. For example: <h1 otherid="boo">.
\"$id\" does not take single quotes into account.
Finally, a general tip for optimizing is to use possessive quantifiers where possible.