Preg match help

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
it2051229
Forum Contributor
Posts: 312
Joined: Tue Dec 25, 2007 8:34 pm

Preg match help

Post by it2051229 »

I'm trying to extract data from an HTML file using a certain tag.

So i wanted to extract the content of H2 tag using pregmatch with this pattern - (<h2.*>)(.*)(<\/h2>)/isxmU
and it is working.

now this time i wanted to extract a content of H2 tag which has an attribute element class='title' so i did this - (<h2 class='title'.*>)(.*)(<\/h2>)/isxmU
and it does not work.....

I have to admit I don't know much about preg match and regular expressions.
cptnwinky
Forum Commoner
Posts: 84
Joined: Sat Dec 27, 2008 10:58 am
Location: Williamstown, MA

Re: Preg match help

Post by cptnwinky »

It's been a long time since I've messed with regular expressions so forgive me if this is wrong but try escaping the single quotes around title.
User avatar
jaoudestudios
DevNet Resident
Posts: 1483
Joined: Wed Jun 18, 2008 8:32 am
Location: Surrey

Re: Preg match help

Post by jaoudestudios »

It might be easier if you show your line of code with the regular expression.

And dont forget some people might use single quotes, while others will use double quotes, so try and make your regular expression flexible. If I remember correctly you can do an OR in there with pipe (|)
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Preg match help

Post by prometheuzz »

it2051229 wrote:...

I have to admit I don't know much about preg match and regular expressions.
No offence, but that shows... ; )
The DOT-STARs are dangerous things: only use them in a last resort (or when you know what you're doing). They match the entire string (especially with the s-flag), and when there is still a part of your regex that needs to be matched, it will then start backtracking. If you do that too often, and your input string is rather large, performance will drop like a, err, stone.

Perhaps the remarks above are all a bit over your head, in which case my proposed solution will look like voodoo to you, but I encourage you to look at it carefully and try to find out how it works. When you have tried and have questions about it, feel free to post back and I'll gladly explain them. Here's a way to do what you asked:

Code: Select all

$html = "ignore <h2 class='AAA'>some text</h2> ignore";
if(preg_match("@<h2(?=[^>]*class='AAA')[^>]*>([^<]*)</h2>@i", $html, $match)) {
  print_r($match);
}
HTH.
Post Reply