DISTINCT or GROUP BY - Dropping Duplicates
Posted: Tue Feb 23, 2010 9:34 am
I have an annoying problem here.
I have a series of tables (about 6-7) that make up the core of our application. The primary table, that drives all others is called a procedure. A procedure has many states (Draft, Archive, Approved, etc).
When a procedure is in Approved state, it cannotbe changed directly but must be cloned and flagged as Draft, where edits may be made and then approved. Upon approval, an procedure then replaces the previous approved procedure.
This is all well and good and most of the system works fine, however one of the tables is a called a sequence table, which is simple in nature:
Because one or more procedures might reference the same sequence (via junction table) and these procedures may be under different states (approved, archive, draft, etc) the constraint I have implemented is to simply disallow any edits to any procedure (and thus sequences, etc) until it's cloned and flagged as 'Draft'.
What this means is, there are duplicates inside the sequences table, which normally would not pose an issue, but the sequences table is also shown indepently of any procedure association, as a master list, to allow procedures to be easily associated with ALL sequences in the system; via click and drag. Not to mention editing of a sequence which might be found via a search filter.
Here is the kicker and real problem I am facing.
The duplicates should not be shown in the sequence list which is currently populated with a default SELECT query:
[sql]SELECT * FROM sequences[/sql]
This will obviously show any duplicate entries, which in case I have no explained, are introduced when an procedure is cloned, all it's supporting table entries are also cloned. So if a procedure has 10 sequences and it's currently under active status and someone edits one sequence associated with it, those sequences are cloned in the table above, and thus you have two sequences with identical fields, except the primary key ID.
[sql]SELECT * FROM sequences GROUP BY description[/sql]
[sql]SELECT DISTINCT(description) FROM sequences[/sql]
Both of these will remove duplicates, based on description text. While this works, it also fails to meet my exact requirements.
Unfornately, DISTINCT and GROUP BY will stop duplicating on the first record found in the database, regardless of ORDER BY or other statements. What that means is that, the sequences that are associated with the 'Approved' procedures are shown, while the sequences just cloned, are not. Again not normally a problem, except that when populating my sequences master list, the ID's of the 'Approved' sequences are used and I actually need the ID's of the records just cloned, as 'Approved' sequences cannot be directly modified. The cloned sequences need to be listed, as they are direclty modified and only once approved, are the cloned entities bumped to 'Approved' and the original is deleted.
Adding a field to the sequences table as some sort of flag, is not an ideal solution. Without going into details, it feels hackish and I would rather find a SQL solution to meet my requirements.
Procedures, on the other hand, have a status field (id_status) which indicates whether they are in 'Draft' 'Approved' or whatever. So what I had yesterday was a little more complicated in terms of SQL queries.
[sql] SELECT * FROM procedures_sequencesLEFT JOIN proceduresON procedures.id_primary = procedure_sequences.id_procedureRIGHT JOIN sequencesON sequences.id_primary = procedure_sequences.id_sequenceWHERE procedures.id_status = 3 OR procedures.id_status IS NULL[/sql]
This essentially showed all sequences which were associated with Draft procedures and sequences which were NOT associated with any procedures. However this will NOT work because I also need to show all sequences associated with live procedures, just not duplicates, in which case the cloned version should be returned bot the original.
Lastly, one alternative option I considered is basically making the clone of a procedure, it's sequences, etc...and swapping the associations so the Approved procedure references the recently cloned sequences and the clone would reference the original sequences. This way DISTINCT and GROUP BY will always show the original sequences (which become clone vis-versa) and thus editing the sequences do not affect Approved procedures.
Cheers,
Alex
I have a series of tables (about 6-7) that make up the core of our application. The primary table, that drives all others is called a procedure. A procedure has many states (Draft, Archive, Approved, etc).
When a procedure is in Approved state, it cannotbe changed directly but must be cloned and flagged as Draft, where edits may be made and then approved. Upon approval, an procedure then replaces the previous approved procedure.
This is all well and good and most of the system works fine, however one of the tables is a called a sequence table, which is simple in nature:
Code: Select all
pkid, time_bench, time_machine, descriptionWhat this means is, there are duplicates inside the sequences table, which normally would not pose an issue, but the sequences table is also shown indepently of any procedure association, as a master list, to allow procedures to be easily associated with ALL sequences in the system; via click and drag. Not to mention editing of a sequence which might be found via a search filter.
Here is the kicker and real problem I am facing.
The duplicates should not be shown in the sequence list which is currently populated with a default SELECT query:
[sql]SELECT * FROM sequences[/sql]
This will obviously show any duplicate entries, which in case I have no explained, are introduced when an procedure is cloned, all it's supporting table entries are also cloned. So if a procedure has 10 sequences and it's currently under active status and someone edits one sequence associated with it, those sequences are cloned in the table above, and thus you have two sequences with identical fields, except the primary key ID.
[sql]SELECT * FROM sequences GROUP BY description[/sql]
[sql]SELECT DISTINCT(description) FROM sequences[/sql]
Both of these will remove duplicates, based on description text. While this works, it also fails to meet my exact requirements.
Unfornately, DISTINCT and GROUP BY will stop duplicating on the first record found in the database, regardless of ORDER BY or other statements. What that means is that, the sequences that are associated with the 'Approved' procedures are shown, while the sequences just cloned, are not. Again not normally a problem, except that when populating my sequences master list, the ID's of the 'Approved' sequences are used and I actually need the ID's of the records just cloned, as 'Approved' sequences cannot be directly modified. The cloned sequences need to be listed, as they are direclty modified and only once approved, are the cloned entities bumped to 'Approved' and the original is deleted.
Adding a field to the sequences table as some sort of flag, is not an ideal solution. Without going into details, it feels hackish and I would rather find a SQL solution to meet my requirements.
Procedures, on the other hand, have a status field (id_status) which indicates whether they are in 'Draft' 'Approved' or whatever. So what I had yesterday was a little more complicated in terms of SQL queries.
[sql] SELECT * FROM procedures_sequencesLEFT JOIN proceduresON procedures.id_primary = procedure_sequences.id_procedureRIGHT JOIN sequencesON sequences.id_primary = procedure_sequences.id_sequenceWHERE procedures.id_status = 3 OR procedures.id_status IS NULL[/sql]
This essentially showed all sequences which were associated with Draft procedures and sequences which were NOT associated with any procedures. However this will NOT work because I also need to show all sequences associated with live procedures, just not duplicates, in which case the cloned version should be returned bot the original.
Lastly, one alternative option I considered is basically making the clone of a procedure, it's sequences, etc...and swapping the associations so the Approved procedure references the recently cloned sequences and the clone would reference the original sequences. This way DISTINCT and GROUP BY will always show the original sequences (which become clone vis-versa) and thus editing the sequences do not affect Approved procedures.
Cheers,
Alex