Regular Expression - How to extract html tags and info

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
simonk
Forum Newbie
Posts: 3
Joined: Mon Apr 03, 2006 11:58 am

Regular Expression - How to extract html tags and info

Post by simonk »

I have the following code stroed in a html file:

<div class="class1">
<a href="http://www.yahoo.com">Yahoo</a>
A search engine
<div class="category">search</div>
</div>

How can I use regular expression to extract the link 'www.yahoo.com', the name 'Yahoo' and the description and the category 'Search' into one array?

I know I should use the preg_match and {} but I just cant get this work..

Please help,
Many many thanks.

CF K
anthony88guy
Forum Contributor
Posts: 246
Joined: Thu Jan 20, 2005 8:22 pm

Re: Regular Expression - How to extract html tags and info

Post by anthony88guy »

simonk wrote:I have the following code stroed in a html file:

<div class="class1">
<a href="http://www.yahoo.com">Yahoo</a>
A search engine
<div class="category">search</div>
</div>

How can I use regular expression to extract the link 'www.yahoo.com', the name 'Yahoo' and the description and the category 'Search' into one array?

I know I should use the preg_match and {} but I just cant get this work..

Please help,
Many many thanks.

CF K

Code: Select all

$link = 'http://www.blahblah.com/blah.html';
$pagecontents = file_get_contents($link);

preg_match_all('#<a href="(.*)">(.*)</a>[\n\s]*A search engine[\n\s]*<div class="category">(.*)</div>#', $pagecontents, $match);

if($match)
{
     echo 'Match1: ' . $match[1] . '<br>';
     echo 'Match2: ' . $match[2] . '<br>';
     echo 'Match3: ' . $match[3] . '<br>';
}else{
     echo 'No matches...';
}
Its not tested, probably has some errors, but that’s the just of it. BTW, I believe their is a forum specifically for regex.
simonk
Forum Newbie
Posts: 3
Joined: Mon Apr 03, 2006 11:58 am

Post by simonk »

Thanks :) but if i change the link, name and description into variable (unknown before the regex is run), how am I going to do it? the only thing i know is the <div class="xxx"></div>
I need to make an autmatic update page that captures informatoin within this div tag.

Thank you so much.
Post Reply