Page 1 of 1

RegEx in PHP Assistance

Posted: Thu May 06, 2010 2:24 pm
by Assorro
I've been studying regular expression for the past two days before actually looking for a community to seek assistance and I simply need help with this.

I have a simple table that I've scraped from a webpage that displays 10 records at a time. Only certain records display an image of a small football to the left of the displayed name and it is only those records that I need the script to display after processing each records name into a database.

The required data sits between the following html where "data I need is here" is written:

[text]//class="smallimage" alt="Football" title="Football"> data I need is here <img src="/images/football.gif[/text]
and this is the pattern I've become so frustrated with:

Code: Select all

$pattern = '@title=["\']Football["\'][^>]*>([^>]+)<img [^>]*src=["\'][^/]/images[^/]/football.gif["\']@i';
I sure would appreciate it if someone could show me how to properly write this. The full script is below. There is no DB processing yet as I want to focus on the pattern. Thanks.

Code: Select all

<?php
function GetBetween($content,$start,$end){
    $r = explode($start, $content);
    if (isset($r[1])){
        $r = explode($end, $r[1]);
        return $r[0];
    }
    return '';
}
//class="smallimage" alt="Football" title="Football">, <img src="/images/football.gif
$scrape_table = GetBetween(file_get_contents('http://www.somepage.com'), '<tbody>', '</tbody>'); 
$pattern = '@title=["\']Football["\'][^>]*>([^>]+)<img [^>]*src=["\'][^/]/images[^/]/football.gif["\']@i';
preg_match_all($pattern, $scrape_table, $matches); 
$result = preg_match_all($pattern, $scrape_table, $matches); 
for ( $counter = 0; $counter <= $result; $counter += 1) 
{
	echo $matches[1][$counter];
	echo "<br />";
}
?>

Re: RegEx in PHP Assistance

Posted: Thu May 06, 2010 5:40 pm
by ridgerunner
Given the very limited example data you provided, try this:

Code: Select all

pattern = '%title=["\']Football["\'][^>]*>([^<]+?)<img [^>]*src=["\']/images/football.gif["\']%i';
This does require that the image file alway be exactly '/images/football\.gif'. Also no other tags can appear in the 'data I need is here' area.
Your regex looks much better than most I see. But watch out, they are addicting!
:)