I am trying to write a php script that goes out to a website and extracts some content off it. I need it to grab the link and the text for that link in between the <h3 id="title"> tags.
A sample of what it looks like is:
Code: Select all
<h3 id="title">
<a href="http://www.testsite.com/test.html">
This is a test
</a>
</h3>
<h3 id="title">
<a href="http://www.testsite.com/123456.html">
This is another test
</a>
</h3>This is the code that i have so far, but the problem is its not looping through all the text its only giving me the first instance. How do i make it give me all the instances.
PHP Class being used ExtractTextBetweenTags.class.php
Code: Select all
<?php
class ExtractTextBetweenTags
{
function extract($string,$ot,$ct)
{
$string = trim($string);
$start = intval(strpos($string,$ot) + strlen($ot));
$mytext = substr($string,$start,intval(strpos($string,$ct) - $start));
return $mytext;
}
}
?>
Code that i have written so far.
Code: Select all
<?
include('ExtractTextBetweenTags.class.php');
$ext = new ExtractTextBetweenTags();
$userAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; YPC 3.0.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727";
$target_url = "http://www.testsite.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
$header = $ext->extract($html,'<h3 id="title">','</h3>');
echo($header);
?>Any help is very much appreciated.
Thank you.