Page 1 of 1

Digg like popularity ranking but in one SQL query

Posted: Wed Mar 14, 2007 3:50 pm
by ed209
I'm not sure it's possible... but here's what I'd like to do.

On my homepage I want to display 12 products. On my site, all products can be rated 1 - 5. Each rating is stored in the rating table. Each rating has the date stored along with it.

The 12 products on the home page should be the most popular during the specified time. So, like digg, I want to display 'Most popular products this month' for example.

I have the following query which is only part way there:

Code: Select all

SELECT      p.*, 
            AVG(r.value) AS avg,

FROM        product p


LEFT JOIN   rating r
ON          p.product_id = r.product_id

WHERE       r.date_created
BETWEEN     '2007-03-14 11:39:09' AND '2007-04-14 11:39:10'


GROUP BY    p.product_id

ORDER BY    avg DESC

LIMIT 12
The problem with this query is that it only returns results that have been voted on in the specified time frame.

I would like to always return 12 results but I'm not sure if the logic of what I'm trying to do will enable that to happen? If only 3 products have been voted on in the time frame then I would like to pick out another 9 by their overall average rating (irrespective of time).

The reason is that I don't think my site will be popular enough initially to have products voted on regularly.

Posted: Wed Mar 14, 2007 4:03 pm
by John Cartwright
Is the timeframe x days from today?

If that`s the case then you just as well could do LIMIT 12 and do away with the date check.

Posted: Wed Mar 14, 2007 4:17 pm
by ed209
I'm not limiting the products by date, I'm limiting the average rating of that product by date. E.g.

say today is 1st March. I want the highest rated items in the past month. so I will only average ratings between 1st Feb and 1st March for a particular product. However, the fact that the product was added 2 years ago is not relevant. The only important factor is that it has been a top rated product over the last month.

Although it will be x days from today - I want to keep the date search as I might want to compare popular items from an earlier period i.e. compare popular products from last Feb with this Feb.

Basically, I think my logic is flawed. What I will have to do is use the above query and if the results are < 12, just fill the rest with 'pretend' results.

Re: Digg like popularity ranking but in one SQL query

Posted: Wed Mar 14, 2007 5:10 pm
by Mordred
ed209 wrote:If only 3 products have been voted on in the time frame then I would like to pick out another 9 by their overall average rating (irrespective of time).
It's good that you're able to vocalize your requirement, now just translate it to SQL:

(SELECT blabla timeframe blabla ORDER by ... LIMIT 12)
UNION
(SELECT blblbl not in timeframe .. LIMIT 12)

I am not sure (and lazy to test) if you can say this:
( (SELECT .. LIMIT 12) UNION (SELECT ... LIMIT 12) LIMIT 12)
This would guarantee 0-12 results

If not, the above scheme would guarantee 0-24 results, of which you can ignore the ones above #12


Btw, the "blblbl not in timeframe" part is important, otherwise you may get some rows twice.

(Good to know that UNION has other practical uses than SQL injection :P )

Posted: Thu Mar 15, 2007 3:16 am
by mikeq
what about something like this, not sure if works as haven't tested it

Code: Select all

SELECT p.*, productaverages.average
FROM product p 
LEFT JOIN   
      (SELECT AVG(r.value) AS average, r.product_id
       FROM rating r 
       WHERE r.date_created BETWEEN '2007-03-14 11:39:09' AND '2007-04-14 11:39:10'
       GROUP BY r.product_id) AS productaverages
ON  (p.product_id = productaverages.product_id)
ORDER BY productaverages.average DESC 
LIMIT 12

Posted: Thu Mar 15, 2007 4:15 am
by ed209
Thanks for the replies. Interesting ideas.

Mordred, backfilling the query if not enough results are found is a neat idea. I didn't know that could be done. I'd probably have more control over the criteria for grabbing another 12 results that way.

mikeq, very interesting. It does work, I didn't know it was possible to do a sub-query like that. Your sql will always return 12 results but lets say that only 1 product has been rated in that time frame. The subsequent 11 will just be grabbed as the first 11 from the product table, I have added a second ORDER BY parameter and I'm wondering if I can get that from ordering by all time ratings.... I will try that out today.

Additional Issue
Just to throw a spanner in the works, last night I was thinking about the weighting to give the 'rating'. If one person rates Product A at 5 and 4 people rate Product B at 4, 4, 5, 5, it is possible that Product B is the better product. However, this would not be reflected in the rating as the AVG would be lower than product A.

Is it possible to give some weighting to the number of votes as well as the AVG ? Maybe have the second ORDER BY parameter as COUNT(number of ratings):

Code: Select all

SELECT         p.*, 
               productaverages.average,
               productaverages.total

FROM           product p

LEFT JOIN   
        ( 
           SELECT    AVG(r.value) AS average, 
                     r.product_id,
                     COUNT(r.value) AS total

           FROM      rating r

           WHERE     r.date_created 
           BETWEEN   '2007-03-14 11:39:09' AND '2007-04-14 11:39:10'

           GROUP BY  r.product_id

        ) AS productaverages

ON             (p.product_id = productaverages.product_id)

ORDER BY       productaverages.average DESC,
               productaverages.total DESC
LIMIT 12 
The above works, but I'm starting to think that digg was not a good example as it uses total votes not ratings. Any suggestions on how to prevent gaming of that logic (i.e. owner of a bad product rates it 5, no one else rates it and so it appears to be a good product).

Posted: Tue Jun 26, 2007 9:48 am
by ed209
The sql query to achieve a Digg like home page has been taking up too much server resource. Is there a more efficient way of doing this?

Current Query:

Code: Select all

SELECT 	p.*,
        u.*
        product.average,
        product.total

FROM    products p
JOIN    usr u ON u.usr_id = p.usr_id


LEFT JOIN
		(
                     SELECT    ( AVG(r.value) + (COUNT(r.value) * 0.1) ) AS average,
                               r.entity_id,
                               COUNT(r.value) AS total

                     FROM      rating r

                     WHERE     r.date_created > '2007-05-26 02:38:45'

                     GROUP BY  r.entity_id
                     LIMIT 15

		)
AS      product

ON      (p.product_id = product.entity_id)


ORDER BY      product.average DESC,
              product.total DESC

LIMIT 15
I think the main problem is the sub-query.