Page 1 of 1

MySQL Group By selecting wrong row out of the group

Posted: Sat May 02, 2009 1:05 pm
by joshmaker
I'm looking for a way to have more control over the row of data selected by the MySQL Group By command. I have a table called Articles which is a list of news articles and the information about them. I would like to select the most recent article for each subject from the Articles table for a given AuthorID.

Here is a sample of three rows from my table, all from AuthorID 89:

Code: Select all

 
ID      |AuthorID|     Headline      | Subject   | URL                      | DatePublished
--------+--------+-------------------+-----------+--------------------------+--------------
1008    |     89 | Sample Headline 1 | Politics  | http://www.example.com/1 | 2009-04-16
735     |     89 | Sample Headline 2 | Politics  | http://www.example.com/2 | 2009-04-15
730     |     89 | Sample Headline 3 | Politics  | http://www.example.com/3 | 2009-04-13
651     |     89 | Something Else    | Economics | http://www.example.com/4 | 2009-03-26
Here is what my MySQL code looks like:

Code: Select all

 
SELECT Headline, Subject, URL, DatePublished 
FROM Articles
WHERE AuthorID = 89 
GROUP BY Subject
ORDER BY DatePublished DESC
And this is what my results look like:

Code: Select all

 
     Headline      | Subject   | URL                      | DatePublished
-------------------+-----------+--------------------------+--------------
 Sample Headline 3 | Politics  | http://www.example.com/3 | 2009-04-13
 Something Else    | Economics | http://www.example.com/4 | 2009-03-26
The problem here is that for the Subject 'Politics' the row that is being selected by the GROUP BY command is not the most recent article.

This is what I would like the results to look like:

Code: Select all

 
     Headline      | Subject   | URL                      | DatePublished
-------------------+-----------+--------------------------+--------------
 Sample Headline 1 | Politics  | http://www.example.com/1 | 2009-04-16
 Something Else    | Economics | http://www.example.com/4 | 2009-03-26
I've tried lots of variations, including selecting MAX(DatePublished) which will indeed get me the date that I want, but will not get me the rest of the information that corresponds to the article written on that date (Headline, URL, etc)

I've wasted a lot of time on this, and feel like there must be a simple solution. Can someone please help me out?

Re: MySQL Group By selecting wrong row out of the group

Posted: Sat May 02, 2009 4:17 pm
by Eran
Unfortunately the GROUP BY clause does not work this way. Strictly speaking, it should return only values unique to the group, however MySQL cheats a little and fills non-unique values with one of the rows in the group. There is no way to control which row data is used on a grouped set.

To solve your requirement you need to combine are better off using a subselect to pick the latest articles. The subselect will select the max time for each group, and the results will be joined with the original table to get the row data.

Code: Select all

SELECT Articles.Headline, Articles.Subject, Articles.URL, Articles.DatePublished 
FROM (
    SELECT subject,MAX(DatePublished) AS date_published
    FROM Articles
    WHERE AuthorID = 89
    GROUP BY Subject
) AS max_dates
INNER JOIN Articles ON Articles.DatePublished = max_dates.date_published AND Articles.Subject = max_dates.Subject
ORDER BY Articles.DatePublished DESC

Re: MySQL Group By selecting wrong row out of the group

Posted: Sat May 02, 2009 4:28 pm
by joshmaker
Awesome, thanks for helping me out. OK, to get this working I had to make one small change to ON clause of the code: AND Articles.AuthorID = max_dates.AuthorID