Page 1 of 1

MySQL problem with order and grouping

Posted: Fri Jun 27, 2014 10:19 am
by charp23
I'm in over my head on this query. Save me.

What I'm after is a query that will group user comments (notes) by a submit_id, but order the groups by the most recent comment within each group. A good analogy is a threaded email inbox - when an older thread receives a new reply it moves to the top of the inbox.

The query below is properly grouping user comments and ordering them within each group, but the groups are ordered by submit_id desc instead of working like my email inbox analogy.

Is this possible with a single query or will I have to loop through the query to get what I'm after? Thanks in advance.

Code: Select all

$my_notes=mysql_query("
SELECT *
FROM (
    SELECT submit_id, timestamp as latest
    FROM tracker_notes
    GROUP BY submit_id
) AS a
JOIN
(
   SELECT *
   FROM tracker_notes
) AS b
ON a.submit_id = b.submit_id
ORDER BY latest desc, timestamp desc
");

Re: MySQL problem with order and grouping

Posted: Fri Jun 27, 2014 2:08 pm
by requinix
Doesn't look like you need any aggregate functions in there (eg, COUNT or MAX). You just want the most recent message for each submit_id, right?

Code: Select all

SELECT tn1.*
FROM tracker_notes tn1
LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND tn1.timestamp < tn2.timestamp
WHERE tn2.id IS NULL
ORDER BY tn1.timestamp DESC
tn2 is used to find a message with the same submit_id and is later than the one in tn1. Then the WHERE makes sure that only rows from tn1 that did not match in tn2 are returned (assuming "id" is the primary key - really, any field from tn2 would work).

Re: MySQL problem with order and grouping

Posted: Sun Jun 29, 2014 9:36 am
by charp23
Thanks requinix. This looks to be really close to what I'm after. The query does order the posts (submit_id) by the most recent note, but it's not picking up all the notes for a given submit_id. In most cases it's returning only the 1 most recent note per submit_id, but in a few cases I'm getting 2 notes. What I'm after is all of the notes per submit_id. It occurs to me that this forum is another good example of what I'm after. When I post my reply, this thread will go to the top of the list. And when you drill into this thread you see all the posts in chronological order.

I should mention that I will need to add a WHERE clause so that the results include only the notes were the logged in user is either the sender or recipient. I can write the clause, but I'm not certain if I should add that to tn1 or tn2.

Re: MySQL problem with order and grouping

Posted: Sun Jun 29, 2014 5:43 pm
by requinix
Could it be the two notes have the same timestamp? I don't see any other way you'd get two or more for the same submit_id.

The forum thing you describe is two different queries. The forum listing is like the thing I posted where you're getting the most recent post/note for every thread/submit_id. Then when you go into a thread you get all the posts, which is the simple

Code: Select all

SELECT *
FROM tracker_notes
WHERE submit_id = ???
ORDER BY timestamp ASC /* or DESC */
As for the WHERE,
a) It goes in a WHERE if you want to filter out results from the set that it would normally have returned.
b) It goes in the JOIN if you want to adjust how you decide whether tn1 (the most recent note) is "better" than tn2 (a reference note).

Being logged in couple actually belong in both places. Using the forum analogy there are three basic possibilities: not logged in, logged in as a regular member, and logged in as a privileged member (eg, staff). You would add a condition
a) In the WHERE to filter out threads the user doesn't have access to. When not logged in the user shouldn't be able to see threads from "member-only" forums, and when logged in the user may not be able to see threads from "staff-only" forums.
b) In the JOIN to adjust how it determines what the most recent post it. When unprivileged the user should only be able to see posts they're allowed to - deleted or moderated posts, for example, should not appear. When privileged the user could see all posts made in the thread, whether deleted or not (if you wanted to show deleted posts as the most recent in the overview page, which actually you probably wouldn't).

After all that the condition sounds like it should go in a WHERE: the most recent note is still the most recent, it's just that you don't want to show the submit_id things when the user isn't one of the participants.

Re: MySQL problem with order and grouping

Posted: Sun Jun 29, 2014 10:59 pm
by charp23
Ah yes. The posts did have the same timestamp. I see where that's coming from and it's not an issue since my WHERE clause will filter it out.

Your explanation about where to place my WHERE clause makes sense and it makes me realize I wasn't correctly describing what I'm after. The first query you posted is the correct order I'm after, but I do want all of the notes for each submit_id. I'm thinking I can do this with a whole lot of queries - first find the order of the submit_ids based on your first query and then cycle through the submit_ids to find all related notes. However, I'm also thinking this could be done in a single query.

Re: MySQL problem with order and grouping

Posted: Sun Jun 29, 2014 11:27 pm
by requinix
You can modify the JOIN to account for duplicate timestamps.

Code: Select all

LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND (tn1.timestamp < tn2.timestamp OR tn1.timestamp = tn2.timestamp AND tn1.id < tn2.id)
If the timestamps are the same then it will defer to the IDs, which are guaranteed to be different.

So all the notes for all the submit_ids, grouped (visually) by the submit_id, starting with the submit_id with the most recent note?

Code: Select all

#1 Friday -- the latest note
#1 Thursday
#1 Wednesday
#2 Thursday -- second-latest
#2 Monday
#3 Monday -- third-latest

Re: MySQL problem with order and grouping

Posted: Mon Jun 30, 2014 10:05 am
by charp23
First off, thank you for your help on this. I really appreciate it.

Yes to your example of "all the notes for all the submit_ids, grouped (visually) by the submit_id, starting with the submit_id with the most recent note." In that example, if Post #3 gets a new note on Saturday, it moves to the top of the list.

Circling back around to my WHERE question... I played with this last night and could not get it to work at all. Here's what I tried:

Code: Select all

SELECT tn1.*
FROM tracker_notes tn1
WHERE tn1.recipient = 1
LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND tn1.timestamp < tn2.timestamp
WHERE tn2.submit_id IS NULL
ORDER BY tn1.timestamp DESC
No matter what I did with the WHERE clause, the query returned no results. In the code above I included a very simple example of the sort of things I tried.

Re: MySQL problem with order and grouping

Posted: Mon Jun 30, 2014 1:32 pm
by requinix
With comments this time, and substituting "submit_id" for "id".

Code: Select all

SELECT tn3.*
/* tn1 will find the most recent note for every submit_id */
FROM tracker_notes tn1
/* tn2 looks up notes from the same submit_id that are more recent. we'll only keep the results where there aren't any */
LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND (tn1.timestamp < tn2.timestamp OR tn1.timestamp = tn2.timestamp AND tn1.submit_id < tn2.submit_id)
/* tn3 finds all notes for that submit_id */
JOIN tracker_notes tn3 ON tn1.submit_id = tn3.submit_id
/* only include results where the tn1 note is the most recent */
WHERE tn2.submit_id IS NULL
/* sort by the most recent note's timestamp, then by the timestamp of each note individually */
ORDER BY tn1.timestamp DESC, tn3.timestamp DESC
I'm unsure of the performance on this though. Can you post an EXPLAIN of this (same query but with "EXPLAIN" at the beginning) as well as a "SHOW CREATE TABLE tracker_notes"?

For your query:

What you've posted isn't valid: there's a stray WHERE in the middle of everything. Did you mean this?

Code: Select all

SELECT tn1.*
FROM tracker_notes tn1
LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND tn1.timestamp < tn2.timestamp
WHERE tn1.recipient = 1 AND tn2.submit_id IS NULL
ORDER BY tn1.timestamp DESC
That query won't return results when the recipient does not have the most recent note. In that case,
1. The query will find everything from each submit_id (as tn1)
2. Grab the most recent note (by using tn2 to filter out the less recent tn1-s)
3. See that the recipient of that note (tn1) is not #1
4. Discard the entire result

We need to adjust the JOIN a bit more. Currently it says to "find the most recent note in the conversation" but now we actually need it to say "find the most recent note in the conversation that the user received". In general, what you have in the WHERE regarding tn1 also needs to be applied to tn2 in the JOIN. Specifically, you want to show only what the user has received so you should only consider recent-ness for notes that the user has received.

Code: Select all

/* find a note: from the conversation, that is more recent than what we've seen so far, that the user has received */
LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND tn1.timestamp < tn2.timestamp AND tn2.recipient = 1
Updating the query at the top of this post,

Code: Select all

LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND (tn1.timestamp < tn2.timestamp OR tn1.timestamp = tn2.timestamp AND tn1.submit_id < tn2.submit_id) AND tn2.recipient = 1

Re: MySQL problem with order and grouping

Posted: Mon Jun 30, 2014 10:09 pm
by charp23
Ug. This may just be due to all of this being over my head, but I can't get it working.

I start with this:

Code: Select all

SELECT tn3.*
FROM tracker_notes tn1
LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND (tn1.timestamp < tn2.timestamp OR tn1.timestamp = tn2.timestamp AND tn1.note_id < tn2.note_id)
JOIN tracker_notes tn3 ON tn1.submit_id = tn3.submit_id
WHERE tn2.submit_id IS NULL
ORDER BY tn1.timestamp DESC, tn3.timestamp DESC
And I get what looks to be the right results. But when I try to filter it down to just those notes where a given user is the recipient of the note, I get zero results. Here's what I tried:

Code: Select all

SELECT tn3.*
FROM tracker_notes tn1
LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND (tn1.timestamp < tn2.timestamp OR tn1.timestamp = tn2.timestamp AND tn1.note_id < tn2.note_id) AND tn2.recipient = 1
JOIN tracker_notes tn3 ON tn1.submit_id = tn3.submit_id
WHERE tn2.submit_id IS NULL
ORDER BY tn1.timestamp DESC, tn3.timestamp DESC
Note that I changed one part of the LEFT JOIN: note_id is the unique ID for each row.

Re: MySQL problem with order and grouping

Posted: Mon Jun 30, 2014 10:34 pm
by requinix
I didn't actually say it but the recipient=1 also needs to be in the WHERE. Without (and with it in the JOIN condition) you should get results but not all of them will be for that recipient. But you shouldn't get no results. Unless maybe you have some very particular test data.

I should have asked this at the start: what data are you testing with? The other important question is what results you're expecting but we're already covered that.

Re: [SOLVED] MySQL problem with order and grouping

Posted: Tue Jul 01, 2014 7:45 pm
by charp23
Okay, I have it working now. Here's what ended up with:

Code: Select all

SELECT tn3.*

FROM tracker_notes tn1

LEFT JOIN tracker_notes tn2 ON tn1.submit_id = tn2.submit_id AND tn1.timestamp < tn2.timestamp AND MY_CONDITIONS

JOIN tracker_notes tn3 ON tn1.submit_id = tn3.submit_id AND MY_CONDITIONS

WHERE tn2.submit_id IS NULL AND MY_CONDITIONS

ORDER BY tn1.timestamp DESC, tn3.timestamp DESC
I added MY_CONDITIONS statements to both JOINs and the WHERE. Also, I was able to drop some of the conditions you introduced to avoid posts with the same timestamp because my conditions filtered them out anyway.

This was surprisingly difficult query (from my point of view, anyway) and it was amazing to have someone who really understands this stuff work it through to a solution. You rock!