Page 1 of 1
Design for hit tracking
Posted: Fri May 04, 2007 10:42 pm
by GeXus
I want to create a simple script that will track impressions of each page. It will be dynamic, in a sense that the reports will show the page URL and the number of impressions.
Pretty simple, however I see two ways of doing this.
1. One table "tracking" with columns (page_url, impressions, date)
2. Two tables "tracking" and "pages" with columns are (page_id, impressions, date) for 'tracking' and (page_id, page_url) for pages.
3. Either of the above two, but without an impressions column, and simply add a new record for each impression with the page_id and date... issue here is that this will get VERY large (billions of rows)
Basically the impressions column could either be done by updating it +1 each time.. or I could simply remove it and do a count based on the date and the page... the issue there is with the date not always being accurate, so I could do do it per day.
Seems like a very simple task and it is.. but i'm curious what you feel is the best approach.
Thanks!
Posted: Fri May 04, 2007 10:54 pm
by GeXus
I should also add.. that this will be used for many sites, so there will be a site_id as well.. with that said, I could also do it on the file system, and break it up by site_id... so each site has it's own file (site_id.txt) .. that would basically be setup like (page_id|date) each on it's own line... I would use page_id and still store the actual page url's in the db.. because I will need to reference them for reporting later on and would like the lookups that sql can provide.
Posted: Fri May 04, 2007 10:54 pm
by sentback
I wouldn go with the first option. It is simple and it does its job. As for the date... Why would date be inaccurate?
Posted: Fri May 04, 2007 10:56 pm
by GeXus
Because if the impression is updated + 1, then the date should be updated, in which case there is no historical data.. only the most recent impression.. which is pretty useless...
Posted: Fri May 04, 2007 11:04 pm
by sentback
I see. You need the time of every visit? I don't see how this can be so important but if this is the case, you should go with the second option. But as you said, if this pages will be visited alot you will soon have numerus rows. Maybe the idea with the .txt files is the best. It has its limits but should do the work just fine. Just be sure to number the lines so you won't have problems counting the hits

Posted: Fri May 04, 2007 11:46 pm
by GeXus
Yes I need the time otherwise the only stats I can have is total... still not so sure about file system... i'd rather keep it in the db, but would like to hear if anyone has done this.. im sure some of you have

Posted: Sat May 05, 2007 4:29 am
by Chris Corbyn
I do this:
Three tables: Pages, Hits, IPs
I know (or I might do, but it's hardly critical) the IP of the visitor. I see if there's a row in the IPs table for it, if not I add it. I then do the same for the current page URI. Then I log a record in hits which links the two. Simple and saves wasting database space by saving the same page URI over and over. You can also analyse unique hits this way.
In reality my schema looks like this:
Code: Select all
stats_module:
_attributes: { phpName: StatsModule }
id:
name: varchar(50)
stats_ip:
_attributes: { phpName: StatsIp }
id:
ip: varchar(15)
created_at:
stats_page:
_attributes: { phpName: StatsPage }
id:
module_id: { type: integer, foreignTable: stats_module, foreignReference: id }
action: varchar(255)
stats_hit:
_attributes: { phpName: <span style='color:blue' title='I'm naughty, are you naughty?'>smurf</span> }
id:
page_id: { type: integer, foreignTable: stats_page, foreignReference: id }
ip_id: { type: integer, foreignTable: stats_ip, foreignReference: id }
created_at:

The language filter filterd <span style='color:blue' title='I'm naughty, are you naughty?'>smurf</span>
EDIT | You can also analyze by date range this way, or by IP by date, or by page by date or whatever.