Extract content

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
slesniak123
Forum Newbie
Posts: 1
Joined: Mon Mar 17, 2008 4:21 am

Extract content

Post by slesniak123 »

Hello,

I am trying to write a php script that goes out to a website and extracts some content off it. I need it to grab the link and the text for that link in between the <h3 id="title"> tags.

A sample of what it looks like is:

Code: Select all

<h3 id="title">
  <a href="http://www.testsite.com/test.html">
    This is a test
  </a>
</h3>
 
 
<h3 id="title">
  <a href="http://www.testsite.com/123456.html">
   This is another test
  </a>
</h3>

This is the code that i have so far, but the problem is its not looping through all the text its only giving me the first instance. How do i make it give me all the instances.

PHP Class being used ExtractTextBetweenTags.class.php

Code: Select all

<?php
 
    class ExtractTextBetweenTags
    {
 
        function extract($string,$ot,$ct)
        {
 
            $string = trim($string);
            $start  = intval(strpos($string,$ot) + strlen($ot));
 
            $mytext = substr($string,$start,intval(strpos($string,$ct) - $start));
 
            return $mytext;
        }
 
    }
 
?>
 

Code that i have written so far.

Code: Select all

 
<?
include('ExtractTextBetweenTags.class.php');
$ext    = new ExtractTextBetweenTags();
 
$userAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; YPC 3.0.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727";
$target_url = "http://www.testsite.com";
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
 
    $header = $ext->extract($html,'<h3 id="title">','</h3>');
    echo($header);
 
?>

Any help is very much appreciated.

Thank you.
Post Reply