Selecting rows with max timestamps in a join

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Selecting rows with max timestamps in a join

Post by Eran »

I'm building a simple message-board like app. For every topic there could be several posts, and I would like to present the topic in a list with some content from the latest post in that topic.
Simplifying it, I have two tables:
Topics:
- id (INT / AI )
- subject (VARCHAR(150))
- created (TIMESTAMP)

Posts:
- id (INT / AI)
- topic_id (INT / FK)
- content (TEXT)
- created (TIMESTAMP)

I want to retrieve rows from the Topics table, along with the latest row for each topic from the Posts table.
Can anyone suggest an efficient way of accomplishing this without using a GROUP BY clause?
User avatar
califdon
Jack of Zircons
Posts: 4484
Joined: Thu Nov 09, 2006 8:30 pm
Location: California, USA

Re: Selecting rows with max timestamps in a join

Post by califdon »

Hmm, not really. What do you have against GROUP BY? :) Actually, there may be a way. but I don't immediately see it. Seriously, why can't you use GROUP BY?
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Selecting rows with max timestamps in a join

Post by Eran »

Basically for performance reasons - with GROUP BY first all the related rows are selected and then grouped. Add to this that in order for the grouping to bring me the relevant content for the row with the max'ed timestamp I would need to add a subquery or another join.

This seems like a little trick could get it nicely... I'm still hoping someone shows me some wizardry :)
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Selecting rows with max timestamps in a join

Post by Benjamin »

I wouldn't worry about performance for a small query like that. As far as I know your not going to get very far without either a UNION or at least 2 sub queries depending on what data your requiring.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Selecting rows with max timestamps in a join

Post by Eran »

Well thats actually only a part of the requirements for the query I'm building - it also joins user relationships via a pivot table etc. So performance is an issue...
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Re: Selecting rows with max timestamps in a join

Post by superdezign »

I've done something similar, and for simplicity's sake, I just did it in separate queries. As long a you aren't displaying a thousand topics per page, it shouldn't be much of a problem.
dml
Forum Contributor
Posts: 133
Joined: Sat Jan 26, 2008 2:20 pm

Re: Selecting rows with max timestamps in a join

Post by dml »

Can't think of a way of avoiding the group by, apart from using application code or triggers to update a latest_post_id attribute for each topic, and that might be the way to go.

Otherwise, you should be able to index the query fairly efficiently. For a start, if you're assigning ids in order, then the latest post = the post with the highest id - this is easier to index and avoids duplicates. Then put an index on (topic_id, id) in the posts table (in fact just an index on (topic_id) should suffice in InnoDB). With this set up, if you do "select topic_id, max(id) from posts group by topic_id", it'll be an index scan - it'll still be a matter of going through every post and every topic, but it won't hit the underlying data tables and it won't require a filesort. And if you do "select topic_id, max(id) from posts where topic_id = 123", it'll jump to the topic in the index and iterate through to get the max id. And the indexes will probably be cached in memory.

This is all unspecific and untested, though. Have you populated with test data and run some EXPLAIN's on your queries?
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Selecting rows with max timestamps in a join

Post by Eran »

dml, I think you are right - denormalizing by updating a latest_post_id seems like the best solution. I think I will go that route, thanks! :)
Post Reply