Problems reading in an amazon file

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
maneesh
Forum Newbie
Posts: 3
Joined: Fri Jul 29, 2005 12:44 pm

Problems reading in an amazon file

Post by maneesh »

Hello

I am trying to write a script that will show only the amazon sales ranking for a site. I tried using this code to get my bearings:

Code: Select all

<?php 

$lines = file('http://www.amazon.com/exec/obidos/tg/detail/-/1592006078/qid=1104460675/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/104-6302267-7125537?v=glance&s=books&n=507846');

foreach ($lines as $line) {
   echo $line;
}
  
?>
The page mostly displays fine, except the only part that is not shown is the Amazon Sales Ranking. Do you know why the sales ranking doesn't show? You see the difference in the two pages at

1) The Amazon Page
2) The page I created


Why does the Sales Ranking not appear?
Thanks

-Maneesh

JCART | Please use

Code: Select all

tags when posting php code. Review [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

If you notice carerfully you'll find many things are missing from the actual site in your page.
One is the 5 stars and the text 5 cutomer reviews on top that you mentioned.
You'll also notice that the Amazon link has Web Design for Teens (Paperback) while your has Web Design for Teens - the (Paperback) text is missing !

This is because Amazon has either used Javascript to dynamically populate the page text or some method of finding out the Server side from where exactly is the page being called.

Same issue in Alexa - everything except the ranking and traffic details are shown when pulled using code.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

Web Design for Teens.. what a strange idea for a book.
maneesh
Forum Newbie
Posts: 3
Joined: Fri Jul 29, 2005 12:44 pm

Post by maneesh »

Hahahah, I wrote that book. Is there any way to pull that information from the page? I'm really trying to get it out.

THanks

-Maneesh
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

Apparantly amazon sends a different page depending on your UserAgent header...

I've used the scriptable webbrowser that is include in http://simpletest.sf.net (actual documentation is at http://www.lastcraft.com/simple_test.php).

Code: Select all

require_once('simpletest/browser.php');

// new browser
$ua =& new SimpleBrowser;

// fake User-Agent
$ua->addHeader('User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6');

// request page
$ua->get('http://www.amazon.com/exec/obidos/tg/detail/-/1592006078/qid=1104460675/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/104-6302267-7125537?v=glance&s=books&n=507846');

// this is what we get 
echo $ua->getContent();

(Ps: I it's at least remarkable, as an author of a webdesign book, that the html on your website(s) doesn't validate.)
maneesh
Forum Newbie
Posts: 3
Joined: Fri Jul 29, 2005 12:44 pm

Post by maneesh »

Hey thanks a lot! I'm actually redoing my website now, but thanks for the great help!!!

The funny thing is, I'm writing a book on PHP now, and I can't even do this. Those who can't do, teach I guess.

-Maneesh
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

timvw - How to send a header like
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6
in fsock or Curl or any other direct method ?
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

Using CURL:

Code: Select all

curl_setopt($ch, CURLOPT_USERAGENT, 'anjanesh browser rules the world');
Using HTTP(S) wrapper:

Code: Select all

ini_set('user_agent', 'anjanesh browser rules the world');
Using sockets (writing HTTP yourself):

Code: Select all

fwrite("User-Agent: anjanesh browser rules the world\n");


Btw, instead of inventing a UserAgent string, you could use the one your browser sends to the script that is going to retrieve the data ;) Code could look like:

Code: Select all

$useragent = $_SERVER['USER_AGENT'];
ini_set('user_agent', $useragent);
$file = file_get_contents('http://example.com');
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

Thanks. Cool !

So this is sending. Is it possible to retrieve also HTTP headers too ? Like COOKIES and SESSION info ?
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

sockets reading: You will read it the same way as you read the data you retrieve
http(s) wrapper: http://www.php.net/stream_get_meta_data
curl: http://www.php.net/curl_getinfo
Post Reply