MySQL Group By selecting wrong row out of the group

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
joshmaker
Forum Commoner
Posts: 25
Joined: Mon May 15, 2006 2:53 pm
Location: Arlington VA

MySQL Group By selecting wrong row out of the group

Post by joshmaker »

I'm looking for a way to have more control over the row of data selected by the MySQL Group By command. I have a table called Articles which is a list of news articles and the information about them. I would like to select the most recent article for each subject from the Articles table for a given AuthorID.

Here is a sample of three rows from my table, all from AuthorID 89:

Code: Select all

 
ID      |AuthorID|     Headline      | Subject   | URL                      | DatePublished
--------+--------+-------------------+-----------+--------------------------+--------------
1008    |     89 | Sample Headline 1 | Politics  | http://www.example.com/1 | 2009-04-16
735     |     89 | Sample Headline 2 | Politics  | http://www.example.com/2 | 2009-04-15
730     |     89 | Sample Headline 3 | Politics  | http://www.example.com/3 | 2009-04-13
651     |     89 | Something Else    | Economics | http://www.example.com/4 | 2009-03-26
Here is what my MySQL code looks like:

Code: Select all

 
SELECT Headline, Subject, URL, DatePublished 
FROM Articles
WHERE AuthorID = 89 
GROUP BY Subject
ORDER BY DatePublished DESC
And this is what my results look like:

Code: Select all

 
     Headline      | Subject   | URL                      | DatePublished
-------------------+-----------+--------------------------+--------------
 Sample Headline 3 | Politics  | http://www.example.com/3 | 2009-04-13
 Something Else    | Economics | http://www.example.com/4 | 2009-03-26
The problem here is that for the Subject 'Politics' the row that is being selected by the GROUP BY command is not the most recent article.

This is what I would like the results to look like:

Code: Select all

 
     Headline      | Subject   | URL                      | DatePublished
-------------------+-----------+--------------------------+--------------
 Sample Headline 1 | Politics  | http://www.example.com/1 | 2009-04-16
 Something Else    | Economics | http://www.example.com/4 | 2009-03-26
I've tried lots of variations, including selecting MAX(DatePublished) which will indeed get me the date that I want, but will not get me the rest of the information that corresponds to the article written on that date (Headline, URL, etc)

I've wasted a lot of time on this, and feel like there must be a simple solution. Can someone please help me out?
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: MySQL Group By selecting wrong row out of the group

Post by Eran »

Unfortunately the GROUP BY clause does not work this way. Strictly speaking, it should return only values unique to the group, however MySQL cheats a little and fills non-unique values with one of the rows in the group. There is no way to control which row data is used on a grouped set.

To solve your requirement you need to combine are better off using a subselect to pick the latest articles. The subselect will select the max time for each group, and the results will be joined with the original table to get the row data.

Code: Select all

SELECT Articles.Headline, Articles.Subject, Articles.URL, Articles.DatePublished 
FROM (
    SELECT subject,MAX(DatePublished) AS date_published
    FROM Articles
    WHERE AuthorID = 89
    GROUP BY Subject
) AS max_dates
INNER JOIN Articles ON Articles.DatePublished = max_dates.date_published AND Articles.Subject = max_dates.Subject
ORDER BY Articles.DatePublished DESC
Last edited by Eran on Sat May 02, 2009 4:50 pm, edited 1 time in total.
joshmaker
Forum Commoner
Posts: 25
Joined: Mon May 15, 2006 2:53 pm
Location: Arlington VA

Re: MySQL Group By selecting wrong row out of the group

Post by joshmaker »

Awesome, thanks for helping me out. OK, to get this working I had to make one small change to ON clause of the code: AND Articles.AuthorID = max_dates.AuthorID
Post Reply