Scraping script loop & array query

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
frilioth
Forum Newbie
Posts: 4
Joined: Fri Jun 08, 2007 11:49 am

Scraping script loop & array query

Post by frilioth »

Hi All

I need a little help with my Amazon price scraping script. I currently sell over 3000 items on amazon and need to keep up to date when prices go down. This script uses a mySQL database, goes to the relevant amazon page then scrapes the lowest price. I can then check the prices against my inventory to make sure i'm not to expensive. The problem is that it continually uses the price from the first url scraped and doesn't update. If anyone can help I'd be very grateful.

Code: Select all

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <title>
      Untitled Document
    </title>
  </head>
  <body>
    <?php
    // listing script

    // connect to the server
    mysql_connect( 'localhost','XXXXXXXXX','XXXXXXXXXX' )
    or die( "Error! Could not connect to database: " . mysql_error() );

    // select the database
    mysql_select_db( XXXXXXXX)
    or die( "Error! Could not select the database: " . mysql_error() );
        

    // retrieve all the rows from the database
    $query = "SELECT * FROM `XXXXXXX`";

    $results = mysql_query( $query );

    // print out the results
    if( $results )
    {
        while( $contact = mysql_fetch_object( $results ) )
        {
            // print out the info
            $id = $contact -> id;
            $sku = $contact -> sku;
            $ASIN = $contact -> ASIN;
            if( !( $data = file_get_contents("http://www.amazon.co.uk/gp/offer-listing/$ASIN/") ) ) 
            {
              die("Could not create a connection to Amazon.co.uk");
            }
            preg_match("/<span class=\"price\">£(.*)\s/i", $data, $match );
            $result = $match[1];
            ?>
            <table width="100%" border="0" cellspacing="0" cellpadding="0">
              <tr>
                <td width="5%">
                  <?php echo($id) ?>
                </td>
                <td width="23%">
                  <?php echo($sku) ?>
                </td>
                <td width="18%">
                  <?php echo($ASIN) ?>
                </td>
                <td width="39%">
                  <?php echo($url) ?>
                </td>
                <td width="20%">
                  <?php echo($result) ?>
                </td>
              </tr>
            </table>
            <?php
        }
    }
    else
    {
        die( "Trouble getting info from database: " . mysql_error() );
    }

    ?>
  </body>
</html>
Please find a working link below.

http://www.amazon.co.uk/gp/offer-listing/B0001K9W9Y/

I think i'm ok using preg_match as I only want the first price on the page (I need to know what the lowest marketplace price is). I think the problem's near $match[1]. The array doesn't seem to update when a new product is selected from my database.

The result i get at the moment is

Code: Select all

1	POBSIGN	B000PZGFPE	http://www.amazon.co.uk/gp/offer-listing/B000PZGFPE/	2.99 
2	POOH&TIGGERSHADE	B000PZ9L6Y	http://www.amazon.co.uk/gp/offer-listing/B000PZ9L6Y/	2.99 
3	PINKSTEERING	B000PZG9UK	http://www.amazon.co.uk/gp/offer-listing/B000PZG9UK/	2.99 
4	56027	B0001K9W9Y	http://www.amazon.co.uk/gp/offer-listing/B0001K9W9Y/	2.99 
5	13962	B0001K9PQ4	http://www.amazon.co.uk/gp/offer-listing/B0001K9PQ4/	2.99 
6	BATMANORGANISER	B000P8XM7K	http://www.amazon.co.uk/gp/offer-listing/B000P8XM7K/	2.99 
7	TWWETYSEATPROTECTOR	B000P8XM70	http://www.amazon.co.uk/gp/offer-listing/B000P8XM70/	2.99 
8	22436	B000P5QEZU	http://www.amazon.co.uk/gp/offer-listing/B000P5QEZU/	2.99
Thanks
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

Post by afbase »

what happens when you print match[1]?
frilioth
Forum Newbie
Posts: 4
Joined: Fri Jun 08, 2007 11:49 am

Post by frilioth »

It returns 2.99
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

We might be in the same boat

Post by afbase »

ok I'm basically stuck on the same problem as you. somehow when i enter a link from an array or object, my function freaks out and doesn't give me the url HTML/data i expect. When I put a string variable into my function, it works fine.

here is my post

Ya i'm not sure what is wrong with the codiing. sorry.
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Hmm... Have you tried using $match[0]?
frilioth
Forum Newbie
Posts: 4
Joined: Fri Jun 08, 2007 11:49 am

Post by frilioth »

I use the PHP preg_match() command, passing it the regular expression, the variable $file that holds the data, and $match which is where the command will store the results. preg_match() returns an array, where the first index ($match[0]) contains the string to extract the data from, and the second position ($match[1] ) is the extracted data. Unfortunately, even though the url updates, $match[1] doesnt and keeps returning the result from the first url scanned. To answer your question, if i use $match[0] it returns the same result.
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

Post by afbase »

what happens if you print $data?
frilioth
Forum Newbie
Posts: 4
Joined: Fri Jun 08, 2007 11:49 am

Post by frilioth »

nothing :cry:
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Wouldn't the Amazon API be a faster, easier and less fragile way to do this?
afbase
Forum Contributor
Posts: 113
Joined: Tue Aug 15, 2006 1:29 pm
Location: SoCAL!!!!

Post by afbase »

frilioth wrote:nothing :cry:
that might be the problem.
Post Reply