Page 1 of 1

Stopping Robots/Crawlers

Posted: Fri Mar 20, 2009 3:30 am
by php_east
I have a site which specifically does not want to be indexed by any sort of crawler by whatever organisation. The reason is commercial, access is to paying clients only, and so robots coming in to index the site would waste bandwidth and cpu utilisation.

how does one stop robots coming in ?
my initial thought is to simply use a http headers and divert accordingly.

Re: Stopping Robots/Crawlers

Posted: Fri Mar 20, 2009 3:52 am
by Benjamin
javascript links?

Re: Stopping Robots/Crawlers

Posted: Fri Mar 20, 2009 4:28 am
by php_east
i would certainly give it a try. i thought the crawlers are educated in javascripts, but maybe not quite :)

thanks.

Re: Stopping Robots/Crawlers

Posted: Fri Mar 20, 2009 6:07 am
by matthijs
I think you can use a robots.txt as well to deny them access

Re: Stopping Robots/Crawlers

Posted: Fri Mar 20, 2009 8:10 am
by Inkyskin
like matthijs says, a standard robots.txt file should be all you need to stop normal crawlers and bots really.

How come you are concerned about bandwidth and CPU usage though, the amount that they would use really would be minimal, if even noticeable at all...

Re: Stopping Robots/Crawlers

Posted: Fri Mar 20, 2009 8:54 am
by php_east
Inkyskin wrote:like matthijs says, a standard robots.txt file should be all you need to stop normal crawlers and bots really.
i don't believe such things as robots.txt is sufficent and that robots will follow them. and there are however so many organisations now with so many robots/crawlers i doubt very much they would all abide by a simple robots.txt. and the number of crawlers is on the rise, if you have not noticed.

if i were asked to make a crawler, i would certainly not listen to robots.txt. i would crawl the entire site as deep as possible to get as much data as possible. for advertising purposes, this may be good, and you welcome search engines crawling your site. for service purposes, this is hell. with crawlers on the rise, my concern is valid IMHO. it's fine if you don't believe me now, but sooner or later someone will blog about it :idea:

yes, i know, i'm about the only person in the world who hates robots. and i'm begining to sound like someone predicting the end of the world, so better i stop.
Inkyskin wrote:How come you are concerned about bandwidth and CPU usage though, the amount that they would use really would be minimal, if even noticeable at all...
they leave paw prints all over and i don't like that :evil:
they have no business at all being in the site. it's a paying clients site.

Re: Stopping Robots/Crawlers

Posted: Fri Mar 20, 2009 9:44 am
by pickle
Make it secure behind a login.

Re: Stopping Robots/Crawlers

Posted: Fri Mar 20, 2009 9:59 am
by php_east
thank you. so simple. :banghead: