Page 1 of 1
Problems reading in an amazon file
Posted: Fri Jul 29, 2005 12:50 pm
by maneesh
Hello
I am trying to write a script that will show only the amazon sales ranking for a site. I tried using this code to get my bearings:
Code: Select all
<?php
$lines = file('http://www.amazon.com/exec/obidos/tg/detail/-/1592006078/qid=1104460675/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/104-6302267-7125537?v=glance&s=books&n=507846');
foreach ($lines as $line) {
echo $line;
}
?>
The page mostly displays fine, except the only part that is not shown is the Amazon Sales Ranking. Do you know why the sales ranking doesn't show? You see the difference in the two pages at
1)
The Amazon Page
2)
The page I created
Why does the Sales Ranking not appear?
Thanks
-Maneesh
JCART | Please use Code: Select all
tags when posting php code. Review [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]
Posted: Fri Jul 29, 2005 1:13 pm
by anjanesh
If you notice carerfully you'll find many things are missing from the actual site in your page.
One is the 5 stars and the text 5 cutomer reviews on top that you mentioned.
You'll also notice that the Amazon link has Web Design for Teens (Paperback) while your has Web Design for Teens - the (Paperback) text is missing !
This is because Amazon has either used Javascript to dynamically populate the page text or some method of finding out the Server side from where exactly is the page being called.
Same issue in Alexa - everything except the ranking and traffic details are shown when pulled using code.
Posted: Fri Jul 29, 2005 1:46 pm
by onion2k
Web Design for Teens.. what a strange idea for a book.
Posted: Fri Jul 29, 2005 1:47 pm
by maneesh
Hahahah, I wrote that book. Is there any way to pull that information from the page? I'm really trying to get it out.
THanks
-Maneesh
Posted: Fri Jul 29, 2005 8:13 pm
by timvw
Apparantly amazon sends a different page depending on your UserAgent header...
I've used the scriptable webbrowser that is include in
http://simpletest.sf.net (actual documentation is at
http://www.lastcraft.com/simple_test.php).
Code: Select all
require_once('simpletest/browser.php');
// new browser
$ua =& new SimpleBrowser;
// fake User-Agent
$ua->addHeader('User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6');
// request page
$ua->get('http://www.amazon.com/exec/obidos/tg/detail/-/1592006078/qid=1104460675/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/104-6302267-7125537?v=glance&s=books&n=507846');
// this is what we get
echo $ua->getContent();
(Ps: I it's at least remarkable, as an author of a webdesign book, that the html on your website(s) doesn't validate.)
Posted: Sat Jul 30, 2005 12:27 am
by maneesh
Hey thanks a lot! I'm actually redoing my website now, but thanks for the great help!!!
The funny thing is, I'm writing a book on PHP now, and I can't even do this. Those who can't do, teach I guess.
-Maneesh
Posted: Sat Jul 30, 2005 12:38 am
by anjanesh
timvw - How to send a header like
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6
in fsock or Curl or any other direct method ?
Posted: Sat Jul 30, 2005 7:20 am
by timvw
Using CURL:
Code: Select all
curl_setopt($ch, CURLOPT_USERAGENT, 'anjanesh browser rules the world');
Using HTTP(S) wrapper:
Code: Select all
ini_set('user_agent', 'anjanesh browser rules the world');
Using sockets (writing HTTP yourself):
Code: Select all
fwrite("User-Agent: anjanesh browser rules the world\n");
Btw, instead of inventing a UserAgent string, you could use the one your browser sends to the script that is going to retrieve the data

Code could look like:
Code: Select all
$useragent = $_SERVER['USER_AGENT'];
ini_set('user_agent', $useragent);
$file = file_get_contents('http://example.com');
Posted: Sat Jul 30, 2005 7:27 am
by anjanesh
Thanks. Cool !
So this is sending. Is it possible to retrieve also HTTP headers too ? Like COOKIES and SESSION info ?
Posted: Sat Jul 30, 2005 7:37 am
by timvw
sockets reading: You will read it the same way as you read the data you retrieve
http(s) wrapper:
http://www.php.net/stream_get_meta_data
curl:
http://www.php.net/curl_getinfo