Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.
I have a content management system that displays the 'most read article' at the side.
after a few weeks of actively using it the noticed thing that the 'most read article' is always the same .. the first article in the site had the most read count because it had been there for a long time.
The simplest (and only) solution i know is to make the rating like this:
can people comment on them or rate them? that would add additional variables to use in your page rating, though using average views per day as you say is a nice straightforward option
First you need to define popular, because to me it means 'most activity' in which case, if a page has 3200 hits and others have half that, than indeed, it will always be shown. Popular might also reflect an overall summation of values, such as comment activity, ratings, hits, etc.
The more entropy you can throw into the mix the more 'realistic' the results will be.
if you keep a record of the individual views you could also assign different weights to different timescales. e.g., every view in the last 3 days = 10 points, every view in last 3 - 7 days = 5 points, and anything before 7 days = 1 point. That way really old views do still count, but not by as much as new ones.
all depends how much detail your database is keeping about visitor activity
edit: you could even track URLs as your user moves around the site, allowing you to calculate time spent on each page. you could then use that, taking into account the length of the page, and work out which pages people spend the most time on!
iankent wrote:if you keep a record of the individual views you could also assign different weights to different timescales. e.g., every view in the last 3 days = 10 points, every view in last 3 - 7 days = 5 points, and anything before 7 days = 1 point.
iankent, PCSpectra: I'm not intending to make the result 'perfect realistic', but to make it as close as possible with minor effect, thanks.
josh: simple, yet very powerful idea, now the most read item will update every week (a refresh on a regular bases)
iankent: this takes a lot of work (and storage) but i think the result will be amazing. VladSun:the ability to convert any written solution into a mathematical formula .. i wish i could do that ..
sam4free wrote: iankent: this takes a lot of work (and storage) but i think the result will be amazing.
that's the problem really. its a trade-off between accuracy and storage capacity. the more metadata you can store, the more accurate you can make your algorithms.
being able to store a count of pageviews is one thing, but being able to store the store the date and time along with the whole contents of $_SERVER and $_REQUEST plus a few extra cookies passed from javascript (screen res etc) gives you a lot more options for sorting and grouping the data into useful datasets.
problem is, it takes up space quickly, and the only solution is money
sam4free wrote:
josh: simple, yet very powerful idea, now the most read item will update every week (a refresh on a regular bases).
Which is probably all the users care about in the first place.
I have done what Ianket suggested, it worked good until that table exceeded 10 million. I suggest a book "data warehousing toolkit" that deals with algorithms for pruning the redundant data ( for ex. do you really need to know details down the individual ip and hour, after that traffic is more than a year old? )
Oh god .. this topic really needs a lot of study.
I think a small site can go with a small way to count visits, but with the larger levels, we need to study the storage implementation well.
thanks for the book, josh. its a valuable book.