Controlling rolled up rows in GROUP'ed BY queries

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Controlling rolled up rows in GROUP'ed BY queries

Post by Eran »

I'm trying to control which row a GROUP BY clause will return, and have been struggling with it way too much. I probably forgot something super basic, but I just can't put my finger on it.

Consider the following schema and some sample rows:

Code: Select all

CREATE TABLE `attachments` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `parent_id` int(10) NOT NULL,
  `parent_type` tinyint(2) NOT NULL,
  `user_id` int(12) NOT NULL,
  `title` varchar(250) collate utf8_unicode_ci DEFAULT NULL,
  `content` text collate utf8_unicode_ci,
  `added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `parent_id` (`parent_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=13 ;
 
INSERT INTO `attachments` (`id`, `parent_id`, `parent_type`, `user_id`, `title`, `content`, `added`) VALUES 
(5, 1, 1, 1, 'test1', NULL, '2008-11-12 16:51:40'),
(6, 1, 1, 1, 'test2', NULL, '2008-11-12 17:20:46'),
(7, 2, 1, 1, 'test3', NULL, '2008-11-13 15:42:01'),
(8, 2, 1, 1, 'test4', 'ddfdf', '2008-11-13 15:42:28'),
(12, 3, 1, 1, 'test5', 'bla asdc', '2008-11-15 15:35:53');
I want to retrieve rows from attachments, GROUP'ed by their parent_id but the rows should be the latest for each parent_id (ie max(added) ). No matter how much I tried, I could not control the rows that that is rolled up by the GROUP BY.

A naive query to attempt that would be:

Code: Select all

SELECT `attachments`.`id` , `attachments`.`added`
FROM `attachments`
WHERE (
    parent_id
    IN ( 1, 2, 3 )
    AND parent_type = '1'
)
GROUP BY `attachments`.`parent_id`
ORDER BY `attachments`.`added` DESC
This returns:

Code: Select all

id     added
12      2008-11-15 15:35:53
7   2008-11-13 15:42:01
5   2008-11-12 16:51:40
Which are not the latest rows. So the order clause has no affect on the results of the grouping. In the end I came up with this gnarly query:

Code: Select all

SELECT `attachments`.`id` , `attachments`.`added`
FROM `attachments`
WHERE (
    `parent_id`
    IN ( 1, 2, 3 )
    AND `parent_type` = '1'
    AND `added` = (
        SELECT MAX( added )
        FROM `attachments` AS `a`
        WHERE `a`.`parent_id` = `attachments`.`parent_id` 
    )
)
GROUP BY `attachments`.`parent_id`
Which uses a dependent subquery to get the job done. This returns:

Code: Select all

id  added
6   2008-11-12 17:20:46
8   2008-11-13 15:42:28
12  2008-11-15 15:35:53
If anyone has a better idea on how to achieve this I would be grateful.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Controlling rolled up rows in GROUP'ed BY queries

Post by VladSun »

Now you know why I keep repeating that "GROUP BY is to be used with aggregate functions and not as a DISTINCT tool" ;)
Subselect is the solution.
Last edited by VladSun on Mon Jan 19, 2009 8:41 am, edited 1 time in total.
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Controlling rolled up rows in GROUP'ed BY queries

Post by VladSun »

There are 10 types of people in this world, those who understand binary and those who don't
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Controlling rolled up rows in GROUP'ed BY queries

Post by Eran »

Crap, I read that article once, totally forgot about it :)
So dependent subselects are the only way? kind of a bummer...
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Controlling rolled up rows in GROUP'ed BY queries

Post by Eran »

For completeness, I include the final solution I came up with (using the link Vlad gave as reference):

Code: Select all

SELECT `a`.`id`, `a`.`added` FROM (
    SELECT parent_id,MAX(added) AS maxadded 
    FROM attachments 
    WHERE parent_id IN (1,2,3) AND parent_type = 1
    GROUP BY parent_id
) AS b
INNER JOIN attachments AS a ON 
a.parent_id=b.parent_id AND a.added=b.maxadded
This uses a correlated subquery, instead of a dependent subquery which gives somewhat better performance. I tested this on a 1.5m rows table, and with the proper indexes (a composite on parent_id and parent_type) it completes at around 0.001 seconds. Pretty nice :)
Post Reply