match content between two html tags based on id

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

match content between two html tags based on id

Post by SidewinderX »

Say I am given this string:

Code: Select all

<span id="some_id">some_data</span>

I want to match some_data. I can easily use the expression

Code: Select all

/<span id="some_id">(.*)<\/span>/is

and generalize it to

Code: Select all

/<span id="$id">(.*)<\/span>/is
.

However, I would like to generalize it even further, to the point, it ignores the "span" and any other other attributes that would be provided. In other words, something like this expression does the job

Code: Select all

preg_match("/<(.*)id=\"$id\"(.*)>(.*)<\/(.*)>/is", $in, $out);

Suffice to say, this has two issues, 1) it is matching and returning data that I do not want - I only want some_data, $out[3]. $out[1], $out[2], and $out[4] are unneeded overhead, and 2) if some_data happens to contain a </tag> that tag will be matched returning incorrect data. I think the latter issue can be solved using a named capture(?)/backreference, but I am not sure how that would work despite my reading regular-expressions.info/named.html

Would some regex wiz enlighten me as to how to solve the two issues outlined above?

Thank you
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Re: match content between two html tags based on id

Post by SidewinderX »

Got it.

Code: Select all

preg_match("/<([\w]+)[^>]+id=\"$id\"[^>]*>([^>]*)<\/\\1>/", $in, $out);
Any suggestions?
Post Reply