Page 1 of 1

Detecting search engine bots and cralwers

Posted: Thu Oct 25, 2007 6:52 pm
by nolanpro
I was wondering if anyone out there had experience in detecting search engines crawlers and bots.

We have a site where we charge our members per-click, but we don't want to charge them when its just a crawler or bot accessing a link.

Keeping a good list of IP addresses to check against would require a lot of maintenance.

I would track how the bot is crawling (links-clicked-per-minuet or something) then after a threshold is reached, it would credit back links that had already been clicked and deducted.

However, that would assume the bot is sending the session ID but I doubt if many do.

I wonder what the pay-per-click ad companies do?

Any ideas would be greatly appreciated! Thanks!!!

Posted: Thu Oct 25, 2007 6:55 pm
by feyd
While it is easy (for more technically minded) to spoof, you could use the results of get_browser()

edit: Also you can use gethostbyname()... but that can be spoofed too.. although a bit more technically challenging to many. It's certainly a higher order fruit than using the user-agent.