Page 1 of 1
How to distinguish human and crawler
Posted: Wed Jan 19, 2005 12:24 pm
by papieros
Hi,
Does anybody of you guys know, how to distinguish human and crawler browsing the webpage? Is this possible? I need a filter function that filters-out ANY crawler. Thanks in advance

Posted: Wed Jan 19, 2005 12:32 pm
by feyd
most crawlers announce themselves in their user agent string.
You could use get_browser() with a newer data set to easily tell "normal" browser strings from bots. Just note that some may post "normal" agent strings.. then it get's a LOT harder to tell.. other than an large number of page requests being run at a very very fast rate..
Posted: Wed Jan 19, 2005 3:57 pm
by JAM
As feyd sais, there are ways to get most of them, but you wont ever get them all. Unless you want to browse the site yourself only that is.
psychedelix is just one of many pages that list spiders/crawlers. Then there are certain modules for Apache that uses logs to trap non-users but all more or less works the same way (using the user agent's). But if you are up for a challenge, you certainly getting ony if you proceed with this. :wink:
Good luck though and do post conclusions and thoughts.
Posted: Thu Jan 20, 2005 4:04 am
by papieros
Thanks for advices - very helpful
I solve the problem by forcing 'human' user to go to 'critical' page through the other one that sets post variable then on 'critical' page examine this var.