I have a site which specifically does not want to be indexed by any sort of crawler by whatever organisation. The reason is commercial, access is to paying clients only, and so robots coming in to index the site would waste bandwidth and cpu utilisation.
how does one stop robots coming in ?
my initial thought is to simply use a http headers and divert accordingly.
Stopping Robots/Crawlers
Moderator: General Moderators
Re: Stopping Robots/Crawlers
javascript links?
Re: Stopping Robots/Crawlers
i would certainly give it a try. i thought the crawlers are educated in javascripts, but maybe not quite 
thanks.
thanks.
Re: Stopping Robots/Crawlers
I think you can use a robots.txt as well to deny them access
Re: Stopping Robots/Crawlers
like matthijs says, a standard robots.txt file should be all you need to stop normal crawlers and bots really.
How come you are concerned about bandwidth and CPU usage though, the amount that they would use really would be minimal, if even noticeable at all...
How come you are concerned about bandwidth and CPU usage though, the amount that they would use really would be minimal, if even noticeable at all...
Re: Stopping Robots/Crawlers
i don't believe such things as robots.txt is sufficent and that robots will follow them. and there are however so many organisations now with so many robots/crawlers i doubt very much they would all abide by a simple robots.txt. and the number of crawlers is on the rise, if you have not noticed.Inkyskin wrote:like matthijs says, a standard robots.txt file should be all you need to stop normal crawlers and bots really.
if i were asked to make a crawler, i would certainly not listen to robots.txt. i would crawl the entire site as deep as possible to get as much data as possible. for advertising purposes, this may be good, and you welcome search engines crawling your site. for service purposes, this is hell. with crawlers on the rise, my concern is valid IMHO. it's fine if you don't believe me now, but sooner or later someone will blog about it
yes, i know, i'm about the only person in the world who hates robots. and i'm begining to sound like someone predicting the end of the world, so better i stop.
they leave paw prints all over and i don't like thatInkyskin wrote:How come you are concerned about bandwidth and CPU usage though, the amount that they would use really would be minimal, if even noticeable at all...
they have no business at all being in the site. it's a paying clients site.
Re: Stopping Robots/Crawlers
Make it secure behind a login.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
Re: Stopping Robots/Crawlers
thank you. so simple. 