Page 1 of 1
Selecting rows with max timestamps in a join
Posted: Wed Jun 11, 2008 3:32 pm
by Eran
I'm building a simple message-board like app. For every topic there could be several posts, and I would like to present the topic in a list with some content from the latest post in that topic.
Simplifying it, I have two tables:
Topics:
- id (INT / AI )
- subject (VARCHAR(150))
- created (TIMESTAMP)
Posts:
- id (INT / AI)
- topic_id (INT / FK)
- content (TEXT)
- created (TIMESTAMP)
I want to retrieve rows from the Topics table, along with the latest row for each topic from the Posts table.
Can anyone suggest an efficient way of accomplishing this without using a GROUP BY clause?
Re: Selecting rows with max timestamps in a join
Posted: Wed Jun 11, 2008 6:07 pm
by califdon
Hmm, not really. What do you have against GROUP BY?

Actually, there may be a way. but I don't immediately see it. Seriously, why can't you use GROUP BY?
Re: Selecting rows with max timestamps in a join
Posted: Wed Jun 11, 2008 6:21 pm
by Eran
Basically for performance reasons - with GROUP BY first all the related rows are selected and then grouped. Add to this that in order for the grouping to bring me the relevant content for the row with the max'ed timestamp I would need to add a subquery or another join.
This seems like a little trick could get it nicely... I'm still hoping someone shows me some wizardry

Re: Selecting rows with max timestamps in a join
Posted: Wed Jun 11, 2008 6:37 pm
by Benjamin
I wouldn't worry about performance for a small query like that. As far as I know your not going to get very far without either a UNION or at least 2 sub queries depending on what data your requiring.
Re: Selecting rows with max timestamps in a join
Posted: Wed Jun 11, 2008 6:59 pm
by Eran
Well thats actually only a part of the requirements for the query I'm building - it also joins user relationships via a pivot table etc. So performance is an issue...
Re: Selecting rows with max timestamps in a join
Posted: Wed Jun 11, 2008 9:07 pm
by superdezign
I've done something similar, and for simplicity's sake, I just did it in separate queries. As long a you aren't displaying a thousand topics per page, it shouldn't be much of a problem.
Re: Selecting rows with max timestamps in a join
Posted: Wed Jun 11, 2008 9:25 pm
by dml
Can't think of a way of avoiding the group by, apart from using application code or triggers to update a latest_post_id attribute for each topic, and that might be the way to go.
Otherwise, you should be able to index the query fairly efficiently. For a start, if you're assigning ids in order, then the latest post = the post with the highest id - this is easier to index and avoids duplicates. Then put an index on (topic_id, id) in the posts table (in fact just an index on (topic_id) should suffice in InnoDB). With this set up, if you do "select topic_id, max(id) from posts group by topic_id", it'll be an index scan - it'll still be a matter of going through every post and every topic, but it won't hit the underlying data tables and it won't require a filesort. And if you do "select topic_id, max(id) from posts where topic_id = 123", it'll jump to the topic in the index and iterate through to get the max id. And the indexes will probably be cached in memory.
This is all unspecific and untested, though. Have you populated with test data and run some EXPLAIN's on your queries?
Re: Selecting rows with max timestamps in a join
Posted: Thu Jun 12, 2008 3:27 am
by Eran
dml, I think you are right - denormalizing by updating a latest_post_id seems like the best solution. I think I will go that route, thanks!
