PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!
I was looking to find a way to keep crawlers out of my users online figures (if they hit in one go they tend to distort things a bit - up to 50 on one occasion). So I did bit of reading and compiled a few tests, and found that my host doesn't put the browscap.ini file on their servers.
As my users online figures are stored on a MySql database I was thinking of a query along the following lines:
Hmm... did a search on that in the PHP manual, couldn't find anything. If you're talking about asking my host (I just have a virtual server) to put a .txt file on their server I might as well ask them to put the browscap.ini file there too. Sorry if I've got confused about what you meant, Mark.
Did some more searching. This is from a page with the year dated 2000, so I'll need to research it a bit more before implementing anything along these lines. I'll adapt it and test it later and let you all know how I got on.
if (strstr($HTTP_USER_AGENT,"htdig") ||
strstr($HTTP_USER_AGENT,"Wget") || strstr($HTTP_USER_AGENT,"Bench") ||
strstr($HTTP_USER_AGENT,"spider") || strstr($HTTP_USER_AGENT,"crawler"))
{
// carry out option if page is a spider/crawler
page_open(array());
}
Additional: I certainly don't wish to stop search engine spiders/crawlers indexing my site. Besides anything else from my logs it looks as if the crawlers come onto one page, then do nothing else in that session. If they're hitting in droves they tend to start a new session every 4-5 seconds.
Once anyone comes onto my site a session is started, and the start time, session id and $HTTP_USER_AGENT are stored in a MySql database. If the above script works and I can filter out anything shown to be a spider/crawler I can hopefully have a users online display which accurately displays the correct number of real users.
You see I'd love to think that 50 people at once could be on my site, but I'm realistic enough to know it isn't very likely.
or use another method to know the visitors online. It's not as elegant but I stock every visitor on any page by IP for 15 minutes in a database. Each query contains an erase of old logs and adds the current one. This allows me to not log my own visits.
Okay, forget what I posted earlier. It seems the only way to do this is to look for certain words in the $HTTP_USER_AGENT variable and eliminate them accordingly, a bit like this:
SELECT * FROM usersunique WHERE browser NOT LIKE '%inktomi%' AND browser NOT LIKE '%googlebot%' AND browser NOT LIKE '%crawl%' ORDER BY 'visitorid' DESC;
(Apologies if the code is lousy, I don't happen to think the MySql.com site is as easy to find its way around as the PHP equivalent.)
That seems to cover most things. In my log I've got one user which looks like it is a spider, but the $HTTP_USER_AGENT gives no clues away. Yet there are about twenty different sessions opened in quick succession by the same I.P. address, possibly something to work on.