Does anybody of you guys know, how to distinguish human and crawler browsing the webpage? Is this possible? I need a filter function that filters-out ANY crawler. Thanks in advance
How to distinguish human and crawler
Moderator: General Moderators
How to distinguish human and crawler
Hi,
Does anybody of you guys know, how to distinguish human and crawler browsing the webpage? Is this possible? I need a filter function that filters-out ANY crawler. Thanks in advance
Does anybody of you guys know, how to distinguish human and crawler browsing the webpage? Is this possible? I need a filter function that filters-out ANY crawler. Thanks in advance
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
most crawlers announce themselves in their user agent string.
You could use get_browser() with a newer data set to easily tell "normal" browser strings from bots. Just note that some may post "normal" agent strings.. then it get's a LOT harder to tell.. other than an large number of page requests being run at a very very fast rate..
You could use get_browser() with a newer data set to easily tell "normal" browser strings from bots. Just note that some may post "normal" agent strings.. then it get's a LOT harder to tell.. other than an large number of page requests being run at a very very fast rate..
As feyd sais, there are ways to get most of them, but you wont ever get them all. Unless you want to browse the site yourself only that is.
psychedelix is just one of many pages that list spiders/crawlers. Then there are certain modules for Apache that uses logs to trap non-users but all more or less works the same way (using the user agent's). But if you are up for a challenge, you certainly getting ony if you proceed with this. :wink:
Good luck though and do post conclusions and thoughts.
psychedelix is just one of many pages that list spiders/crawlers. Then there are certain modules for Apache that uses logs to trap non-users but all more or less works the same way (using the user agent's). But if you are up for a challenge, you certainly getting ony if you proceed with this. :wink:
Good luck though and do post conclusions and thoughts.