Page 1 of 1
Preg match help
Posted: Fri Dec 26, 2008 6:24 am
by it2051229
I'm trying to extract data from an HTML file using a certain tag.
So i wanted to extract the content of H2 tag using pregmatch with this pattern - (<h2.*>)(.*)(<\/h2>)/isxmU
and it is working.
now this time i wanted to extract a content of H2 tag which has an attribute element class='title' so i did this - (<h2 class='title'.*>)(.*)(<\/h2>)/isxmU
and it does not work.....
I have to admit I don't know much about preg match and regular expressions.
Re: Preg match help
Posted: Sat Dec 27, 2008 2:42 pm
by cptnwinky
It's been a long time since I've messed with regular expressions so forgive me if this is wrong but try escaping the single quotes around title.
Re: Preg match help
Posted: Sat Dec 27, 2008 2:46 pm
by jaoudestudios
It might be easier if you show your line of code with the regular expression.
And dont forget some people might use single quotes, while others will use double quotes, so try and make your regular expression flexible. If I remember correctly you can do an OR in there with pipe (|)
Re: Preg match help
Posted: Sat Dec 27, 2008 4:18 pm
by prometheuzz
it2051229 wrote:...
I have to admit I don't know much about preg match and regular expressions.
No offence, but that shows... ; )
The DOT-STARs are dangerous things: only use them in a last resort (or when you know what you're doing). They match the entire string (especially with the s-flag), and when there is still a part of your regex that needs to be matched, it will then start backtracking. If you do that too often, and your input string is rather large, performance will drop like a, err, stone.
Perhaps the remarks above are all a bit over your head, in which case my proposed solution will look like voodoo to you, but I encourage you to look at it carefully and try to find out how it works. When you have tried and have questions about it, feel free to post back and I'll gladly explain them. Here's a way to do what you asked:
Code: Select all
$html = "ignore <h2 class='AAA'>some text</h2> ignore";
if(preg_match("@<h2(?=[^>]*class='AAA')[^>]*>([^<]*)</h2>@i", $html, $match)) {
print_r($match);
}
HTH.