Design for hit tracking

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Design for hit tracking

Post by GeXus »

I want to create a simple script that will track impressions of each page. It will be dynamic, in a sense that the reports will show the page URL and the number of impressions.

Pretty simple, however I see two ways of doing this.

1. One table "tracking" with columns (page_url, impressions, date)

2. Two tables "tracking" and "pages" with columns are (page_id, impressions, date) for 'tracking' and (page_id, page_url) for pages.

3. Either of the above two, but without an impressions column, and simply add a new record for each impression with the page_id and date... issue here is that this will get VERY large (billions of rows)

Basically the impressions column could either be done by updating it +1 each time.. or I could simply remove it and do a count based on the date and the page... the issue there is with the date not always being accurate, so I could do do it per day.

Seems like a very simple task and it is.. but i'm curious what you feel is the best approach.

Thanks!
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

I should also add.. that this will be used for many sites, so there will be a site_id as well.. with that said, I could also do it on the file system, and break it up by site_id... so each site has it's own file (site_id.txt) .. that would basically be setup like (page_id|date) each on it's own line... I would use page_id and still store the actual page url's in the db.. because I will need to reference them for reporting later on and would like the lookups that sql can provide.
Last edited by GeXus on Fri May 04, 2007 10:55 pm, edited 1 time in total.
sentback
Forum Newbie
Posts: 24
Joined: Fri May 04, 2007 9:46 am

Post by sentback »

I wouldn go with the first option. It is simple and it does its job. As for the date... Why would date be inaccurate?
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

Because if the impression is updated + 1, then the date should be updated, in which case there is no historical data.. only the most recent impression.. which is pretty useless...
sentback
Forum Newbie
Posts: 24
Joined: Fri May 04, 2007 9:46 am

Post by sentback »

I see. You need the time of every visit? I don't see how this can be so important but if this is the case, you should go with the second option. But as you said, if this pages will be visited alot you will soon have numerus rows. Maybe the idea with the .txt files is the best. It has its limits but should do the work just fine. Just be sure to number the lines so you won't have problems counting the hits ;)
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

Yes I need the time otherwise the only stats I can have is total... still not so sure about file system... i'd rather keep it in the db, but would like to hear if anyone has done this.. im sure some of you have :)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

I do this:

Three tables: Pages, Hits, IPs

I know (or I might do, but it's hardly critical) the IP of the visitor. I see if there's a row in the IPs table for it, if not I add it. I then do the same for the current page URI. Then I log a record in hits which links the two. Simple and saves wasting database space by saving the same page URI over and over. You can also analyse unique hits this way.

In reality my schema looks like this:

Code: Select all

  stats_module:
    _attributes: { phpName: StatsModule }
    id:
    name:        varchar(50)
  stats_ip:
    _attributes: { phpName: StatsIp }
    id:
    ip:          varchar(15)
    created_at:
  stats_page:
    _attributes: { phpName: StatsPage }
    id:
    module_id:   { type: integer, foreignTable: stats_module, foreignReference: id }
    action:      varchar(255)
  stats_hit:
    _attributes: { phpName: <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span> }
    id:
    page_id:     { type: integer, foreignTable: stats_page, foreignReference: id }
    ip_id:       { type: integer, foreignTable: stats_ip, foreignReference: id }
    created_at:
:lol: The language filter filterd <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span> :lol:

EDIT | You can also analyze by date range this way, or by IP by date, or by page by date or whatever.
Post Reply