How to distinguish human and crawler

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
papieros
Forum Newbie
Posts: 8
Joined: Wed Jan 19, 2005 12:15 pm

How to distinguish human and crawler

Post by papieros »

Hi,
Does anybody of you guys know, how to distinguish human and crawler browsing the webpage? Is this possible? I need a filter function that filters-out ANY crawler. Thanks in advance :)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

most crawlers announce themselves in their user agent string.

You could use get_browser() with a newer data set to easily tell "normal" browser strings from bots. Just note that some may post "normal" agent strings.. then it get's a LOT harder to tell.. other than an large number of page requests being run at a very very fast rate..
User avatar
JAM
DevNet Resident
Posts: 2101
Joined: Fri Aug 08, 2003 6:53 pm
Location: Sweden
Contact:

Post by JAM »

As feyd sais, there are ways to get most of them, but you wont ever get them all. Unless you want to browse the site yourself only that is.

psychedelix is just one of many pages that list spiders/crawlers. Then there are certain modules for Apache that uses logs to trap non-users but all more or less works the same way (using the user agent's). But if you are up for a challenge, you certainly getting ony if you proceed with this. :wink:

Good luck though and do post conclusions and thoughts.
papieros
Forum Newbie
Posts: 8
Joined: Wed Jan 19, 2005 12:15 pm

Post by papieros »

Thanks for advices - very helpful
I solve the problem by forcing 'human' user to go to 'critical' page through the other one that sets post variable then on 'critical' page examine this var.
Post Reply