Page 1 of 1

extracting value from an HTML tag

Posted: Tue May 24, 2005 3:09 am
by jasongr
Hello

I have a string that be a simple as "Hello"
or may be embedded as value inside an html tags like so:
"<b><span ...>Hello</span></b>"
I need some way to extract the "Hello" part from the string

Is there any HTML processing function in PHP that can help me here?

regards

Posted: Tue May 24, 2005 3:31 am
by malcolmboston
someone will be abvle to write the regex for you.
what your looking for basically is:

preg_match everything between <> + </>

Posted: Tue May 24, 2005 7:00 am
by John Cartwright
Moved to Regex.

Posted: Tue May 24, 2005 1:38 pm
by Skara

Code: Select all

preg_match('|<(.+)>(.+?)</\\1>|',$data,$matches);
I think that will work. Not sure if you can put \\1 there or not. ^^;

In the following:

Code: Select all

&lt;b&gt;&lt;i&gt;text&lt;/i&gt;&lt;/b&gt;
one would return "<i>text</i>".

So, if you just want to strip the tags,

Code: Select all

$data = preg_replace('|<[^>]+>|','',$data);

Posted: Wed May 25, 2005 4:35 am
by Chris Corbyn
You can indeed use \\1 within the match itself. Can I just say however, that due to the nature of escaping numeric sequences it is always now recommended to use $1 other than how it is used here inside the regex.

Another point; Don't delimit the regex with "|" - use # instead. The BAR ( "|" ) character is a regex operator so it's a nasty habit to get into.

There's a problem with the regex you write however anyway.

Take for example:

Code: Select all

&lt;span style=&quote;color:red&quote;&gt;Hello world&lt;/span&gt;
Now, your regex is looking for

Code: Select all

&lt;span style=&quote;color:red&quote;&gt;Hello world&lt;/span style=&quote;color:red&quote;&gt;
Do this instead....

Code: Select all

preg_match('#<(\w+)[^>]*>(.+?)</\\1>#s', $data, $matches);

print_r($matches);