extracting value from an HTML tag

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
jasongr
Forum Contributor
Posts: 206
Joined: Tue Jul 27, 2004 6:19 am

extracting value from an HTML tag

Post by jasongr »

Hello

I have a string that be a simple as "Hello"
or may be embedded as value inside an html tags like so:
"<b><span ...>Hello</span></b>"
I need some way to extract the "Hello" part from the string

Is there any HTML processing function in PHP that can help me here?

regards
malcolmboston
DevNet Resident
Posts: 1826
Joined: Tue Nov 18, 2003 1:09 pm
Location: Middlesbrough, UK

Post by malcolmboston »

someone will be abvle to write the regex for you.
what your looking for basically is:

preg_match everything between <> + </>
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Moved to Regex.
User avatar
Skara
Forum Regular
Posts: 703
Joined: Sat Mar 12, 2005 7:13 pm
Location: US

Post by Skara »

Code: Select all

preg_match('|<(.+)>(.+?)</\\1>|',$data,$matches);
I think that will work. Not sure if you can put \\1 there or not. ^^;

In the following:

Code: Select all

&lt;b&gt;&lt;i&gt;text&lt;/i&gt;&lt;/b&gt;
one would return "<i>text</i>".

So, if you just want to strip the tags,

Code: Select all

$data = preg_replace('|<[^>]+>|','',$data);
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

You can indeed use \\1 within the match itself. Can I just say however, that due to the nature of escaping numeric sequences it is always now recommended to use $1 other than how it is used here inside the regex.

Another point; Don't delimit the regex with "|" - use # instead. The BAR ( "|" ) character is a regex operator so it's a nasty habit to get into.

There's a problem with the regex you write however anyway.

Take for example:

Code: Select all

&lt;span style=&quote;color:red&quote;&gt;Hello world&lt;/span&gt;
Now, your regex is looking for

Code: Select all

&lt;span style=&quote;color:red&quote;&gt;Hello world&lt;/span style=&quote;color:red&quote;&gt;
Do this instead....

Code: Select all

preg_match('#<(\w+)[^>]*>(.+?)</\\1>#s', $data, $matches);

print_r($matches);
Post Reply