Crawl Amazon.com

Ye' old general discussion board. Basically, for everything that isn't covered elsewhere. Come here to shoot the breeze, shoot your mouth off, or whatever suits your fancy.
This forum is not for asking programming related questions.

Moderator: General Moderators

Post Reply
btm
Forum Newbie
Posts: 11
Joined: Wed Sep 14, 2005 7:21 am

Crawl Amazon.com

Post by btm »

Hey hey,
I've been approached with an interesting product and I was wondering if anyone out there has any knowledge of this.

I have a potential client that want's to grab basic book data from amazon.com (title, isbn, price) and store it on his computer to be accessed if there is no internet access. It looks like I could use one of the Amazon Web Services that you pay for, but I was curious if it is possible to actually create a program that will pull that data down without paying Amazon for it.

I've never created any kind of spyder scripts, so I might be overlooking an obvious solution...

Any data is greatly appreciated!
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

Possible: Yes

Illegal: Probably Yes
User avatar
cj5
Forum Commoner
Posts: 60
Joined: Tue Jan 17, 2006 3:38 pm
Location: Long Island, NY, USA

Post by cj5 »

Do you work for Bill Gates?

Amazon's data shema is copyrighted (from their copyright notice)
All content included on this site, such as text, graphics, logos, button icons, images, audio clips, digital downloads, data compilations, and software, is the property of Amazon.com or its content suppliers and protected by United States and international copyright laws. The compilation of all content on this site is the exclusive property of Amazon.com and protected by U.S. and international copyright laws. All software used on this site is the property of Amazon.com or its software suppliers and protected by United States and international copyright laws.
If you can crawl all their info, and create a dump tour client's database, all the more power to you. That would be quite the accomplishment. But I would make sure you make the client aware of the legal reprecussions.

As far as grabbing the data, I could only suggest doing exactly what it is you're asking, and that is CRAWL the information. Try to think up an approach to grabbing categories of info, and cycle thru them without duplicating info. Best of luck!
User avatar
dethron
Forum Contributor
Posts: 370
Joined: Sat Apr 27, 2002 11:39 am
Location: Istanbul

Post by dethron »

Amazon is willing to give data via XML ;)

Here is an example :

Code: Select all

http://lmap.co.nr/Amazon1.htm
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

As long as you spider using XML it should be fine. The only condition is that You may not make more than one ECS request per second per IP address. Last line here.

Btw, from where can I find the docs on retrieving information in XML format like the way its shown on lmap.co.nr/Amazon1.htm ?
Parameter docs ?

EDIT : Get category list from browsenodes
Post Reply