stats-spiders
Moderator: General Moderators
stats-spiders
I am storing all the IP addresses and their useragents of all visitors on my website in a mysql table, and I want to filter out the spiders and cralwers when I am displaying the totals for webstats.
i have a list of useragents, and when I get a new hit, I am currently just dumping it in a different table. During stats display, i retrieve each hit and compare that useragent to the spider useragent's list.
say I have a table Spiders. In this table I have columns, A, B, C, D through Z
every spider gets written down in the col which maches the spiders useragents first char.
so basically, I don't have my query searching all fields when checking for a spider, but only checking the col which matches the first char.
Think this decreases load and thus waiting time, but it still takes ages to load.
Any suggestions?
i have a list of useragents, and when I get a new hit, I am currently just dumping it in a different table. During stats display, i retrieve each hit and compare that useragent to the spider useragent's list.
say I have a table Spiders. In this table I have columns, A, B, C, D through Z
every spider gets written down in the col which maches the spiders useragents first char.
so basically, I don't have my query searching all fields when checking for a spider, but only checking the col which matches the first char.
Think this decreases load and thus waiting time, but it still takes ages to load.
Any suggestions?
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
It doesn't really decrease load (it's likely the opposite), but does require your database to store a lot of data that isn't being used.
I'd generally use get_browser() or a pure php version of it.
[edit]fix link
I'd generally use get_browser() or a pure php version of it.
[edit]fix link
Last edited by feyd on Fri Apr 07, 2006 7:54 am, edited 1 time in total.
stats-spiders
Sorry feyd- I dont follow. How do you mean? I already have a list of all the ua's for each hit. The problem is when I am extracting that data and compring each UA with the spider's UA.
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
It has nothing to do with having the user agents, but more accurately knowing which agents are what. get_browser() builds detailed information about the user agent given to it.
if i want to store a flag there are two issues
1) i will need to do this when a new hit is registered, which means more waitin time for my user
and more importantly,
2) the "decide if it is a spider" is taking lots of time anways -whether i do it on my user's time or when the stats display is being generated. i ahve tried 3 things for deciding what is a spider
a)just do a text search (thats 557 strings being searched evey time a hit is picked up) in normal php code
b)put all the spider's list in a database and query the database (obviously takes longer that the first one)
c)divide the spiders table in A,B,C....Z and when a new hit comes, check if it begins with 'a' and then query the database. so this divides the load
But its still taking hours!!!!! I REALLY appreciate your help on this. I have been trying to fix this for the past 2 months!
1) i will need to do this when a new hit is registered, which means more waitin time for my user
and more importantly,
2) the "decide if it is a spider" is taking lots of time anways -whether i do it on my user's time or when the stats display is being generated. i ahve tried 3 things for deciding what is a spider
a)just do a text search (thats 557 strings being searched evey time a hit is picked up) in normal php code
b)put all the spider's list in a database and query the database (obviously takes longer that the first one)
c)divide the spiders table in A,B,C....Z and when a new hit comes, check if it begins with 'a' and then query the database. so this divides the load
But its still taking hours!!!!! I REALLY appreciate your help on this. I have been trying to fix this for the past 2 months!
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
As I said, use get_browser(). You don't have to search through your list of spiders as it has a huge list built-in and can find out whether it's a spider very quickly.
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
Personally, it'd be a normalized approach where I'd have one main table followed by several related tables for things like "platform," "browser," etcetera.dal_oscar wrote:OK. so when someone visits my website, i go via get_browser and then store them in 2 different tables - normal people vs spiders?
Example outputs from the function are given in both pages I have linked to in this thread already. At minimum, store most, if not all, of the elements returned by get_browser() (or my function, posted earlier) as separate fields in the table. For statistics, you'll likely want to have this data broken apart already anyways as it makes for grouping together bits much easier.via aol translation, dal_oscar wrote:could you give me an example of how to use get_browser for such a thing?