Stopping Robots/Crawlers

Ye' old general discussion board. Basically, for everything that isn't covered elsewhere. Come here to shoot the breeze, shoot your mouth off, or whatever suits your fancy.
This forum is not for asking programming related questions.

Moderator: General Moderators

Post Reply
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Stopping Robots/Crawlers

Post by php_east »

I have a site which specifically does not want to be indexed by any sort of crawler by whatever organisation. The reason is commercial, access is to paying clients only, and so robots coming in to index the site would waste bandwidth and cpu utilisation.

how does one stop robots coming in ?
my initial thought is to simply use a http headers and divert accordingly.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Stopping Robots/Crawlers

Post by Benjamin »

javascript links?
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: Stopping Robots/Crawlers

Post by php_east »

i would certainly give it a try. i thought the crawlers are educated in javascripts, but maybe not quite :)

thanks.
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Re: Stopping Robots/Crawlers

Post by matthijs »

I think you can use a robots.txt as well to deny them access
User avatar
Inkyskin
Forum Contributor
Posts: 282
Joined: Mon Nov 19, 2007 10:15 am
Location: UK

Re: Stopping Robots/Crawlers

Post by Inkyskin »

like matthijs says, a standard robots.txt file should be all you need to stop normal crawlers and bots really.

How come you are concerned about bandwidth and CPU usage though, the amount that they would use really would be minimal, if even noticeable at all...
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: Stopping Robots/Crawlers

Post by php_east »

Inkyskin wrote:like matthijs says, a standard robots.txt file should be all you need to stop normal crawlers and bots really.
i don't believe such things as robots.txt is sufficent and that robots will follow them. and there are however so many organisations now with so many robots/crawlers i doubt very much they would all abide by a simple robots.txt. and the number of crawlers is on the rise, if you have not noticed.

if i were asked to make a crawler, i would certainly not listen to robots.txt. i would crawl the entire site as deep as possible to get as much data as possible. for advertising purposes, this may be good, and you welcome search engines crawling your site. for service purposes, this is hell. with crawlers on the rise, my concern is valid IMHO. it's fine if you don't believe me now, but sooner or later someone will blog about it :idea:

yes, i know, i'm about the only person in the world who hates robots. and i'm begining to sound like someone predicting the end of the world, so better i stop.
Inkyskin wrote:How come you are concerned about bandwidth and CPU usage though, the amount that they would use really would be minimal, if even noticeable at all...
they leave paw prints all over and i don't like that :evil:
they have no business at all being in the site. it's a paying clients site.
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Stopping Robots/Crawlers

Post by pickle »

Make it secure behind a login.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: Stopping Robots/Crawlers

Post by php_east »

thank you. so simple. :banghead:
Post Reply