preg_match_all problem.
Posted: Fri Jul 01, 2011 9:17 am
Hey, I'm trying to extract certain links from the telegraph news website. I'm using a preg_match_all function because the links I want to extract maintain a consistent pattern.
Here is a sample of the source I want to extract the link from:
But for some reason the output is just: 'Array ( [0] => Array ( ) ) '. I've even tested my expression using an reg_expression tester online, and there it picks up the link.
Does anyone have any idea why my expression will not pick out the links in the above page source?
Thanks alot,
Phil
Here is a sample of the source I want to extract the link from:
As you can see the links have a 7-digit identifier, so my code so far goes like this:<h3>
<a href="/finance/dominique-strauss-kahn/8610673/Dominique-Strauss-Kahn-sexual-assault-case-on-verge-of-collapse-amid-doubts-over-maid.html">Dominique Strauss-Kahn 'could still enter French presidential race'</a>
</h3>
<div class="picleft containerdiv ">
<a href="/finance/dominique-strauss-kahn/8610673/Dominique-Strauss-Kahn-sexual-assault-case-on-verge-of-collapse-amid-doubts-over-maid.html"><img src="http://i.telegraph.co.uk/multimedia/arc ... 01927g.jpg" alt="Dominique Strauss-Kahn at Manhattan Criminal Court " border="0" width="140" height="87" />
<span class="cornerimageleft"> </span></a>
</div>
Code: Select all
$html = file_get_contents("http://www.telegraph.co.uk");
preg_match_all("#href=\"[a-z|A-Z|0-9|\/|\.\-]+[0-9]{7}.+a>$#", $html, $link);
print_r($link);
Does anyone have any idea why my expression will not pick out the links in the above page source?
Thanks alot,
Phil