Page 1 of 1

Crawl Amazon.com

Posted: Mon May 08, 2006 10:53 am
by btm
Hey hey,
I've been approached with an interesting product and I was wondering if anyone out there has any knowledge of this.

I have a potential client that want's to grab basic book data from amazon.com (title, isbn, price) and store it on his computer to be accessed if there is no internet access. It looks like I could use one of the Amazon Web Services that you pay for, but I was curious if it is possible to actually create a program that will pull that data down without paying Amazon for it.

I've never created any kind of spyder scripts, so I might be overlooking an obvious solution...

Any data is greatly appreciated!

Posted: Mon May 08, 2006 10:57 am
by JayBird
Possible: Yes

Illegal: Probably Yes

Posted: Mon May 08, 2006 12:19 pm
by cj5
Do you work for Bill Gates?

Amazon's data shema is copyrighted (from their copyright notice)
All content included on this site, such as text, graphics, logos, button icons, images, audio clips, digital downloads, data compilations, and software, is the property of Amazon.com or its content suppliers and protected by United States and international copyright laws. The compilation of all content on this site is the exclusive property of Amazon.com and protected by U.S. and international copyright laws. All software used on this site is the property of Amazon.com or its software suppliers and protected by United States and international copyright laws.
If you can crawl all their info, and create a dump tour client's database, all the more power to you. That would be quite the accomplishment. But I would make sure you make the client aware of the legal reprecussions.

As far as grabbing the data, I could only suggest doing exactly what it is you're asking, and that is CRAWL the information. Try to think up an approach to grabbing categories of info, and cycle thru them without duplicating info. Best of luck!

Posted: Mon May 08, 2006 12:57 pm
by dethron
Amazon is willing to give data via XML ;)

Here is an example :

Code: Select all

http://lmap.co.nr/Amazon1.htm

Posted: Wed Jul 19, 2006 1:39 pm
by anjanesh
As long as you spider using XML it should be fine. The only condition is that You may not make more than one ECS request per second per IP address. Last line here.

Btw, from where can I find the docs on retrieving information in XML format like the way its shown on lmap.co.nr/Amazon1.htm ?
Parameter docs ?

EDIT : Get category list from browsenodes