Page 1 of 1
extracting value from an HTML tag
Posted: Tue May 24, 2005 3:09 am
by jasongr
Hello
I have a string that be a simple as "Hello"
or may be embedded as value inside an html tags like so:
"<b><span ...>Hello</span></b>"
I need some way to extract the "Hello" part from the string
Is there any HTML processing function in PHP that can help me here?
regards
Posted: Tue May 24, 2005 3:31 am
by malcolmboston
someone will be abvle to write the regex for you.
what your looking for basically is:
preg_match everything between <> + </>
Posted: Tue May 24, 2005 7:00 am
by John Cartwright
Moved to Regex.
Posted: Tue May 24, 2005 1:38 pm
by Skara
Code: Select all
preg_match('|<(.+)>(.+?)</\\1>|',$data,$matches);
I
think that will work. Not sure if you can put \\1 there or not. ^^;
In the following:
Code: Select all
<b><i>text</i></b>
one would return "<i>text</i>".
So, if you just want to
strip the tags,
Code: Select all
$data = preg_replace('|<[^>]+>|','',$data);
Posted: Wed May 25, 2005 4:35 am
by Chris Corbyn
You can indeed use \\1 within the match itself. Can I just say however, that due to the nature of escaping numeric sequences it is always now recommended to use $1
other than how it is used here inside the regex.
Another point; Don't delimit the regex with "|" - use # instead. The BAR ( "|" ) character is a regex operator so it's a nasty habit to get into.
There's a problem with the regex you write however anyway.
Take for example:
Code: Select all
<span style="e;color:red"e;>Hello world</span>
Now, your regex is looking for
Code: Select all
<span style="e;color:red"e;>Hello world</span style="e;color:red"e;>
Do this instead....
Code: Select all
preg_match('#<(\w+)[^>]*>(.+?)</\\1>#s', $data, $matches);
print_r($matches);