Page 1 of 1

How do you find the second, of the two duplicates?

Posted: Mon Nov 05, 2012 9:45 am
by simonmlewis

Code: Select all

SELECT a,b
FROM tbl
GROUP BY a,b
HAVING COUNT(*)>1; 
This script works, but it always shows the FIRST entry to say "this has a duplicate".

How do I find "the duplicate", thus being able to delete all duplicates?

Re: How do you find the second, of the two duplicates?

Posted: Mon Nov 05, 2012 12:28 pm
by Weirdan
If you want to delete duplicates, you don't have to select them. The most efficient way I know is to just add unique key ignoring any errors (if you want to keep the first record from any set of duplicates):

Code: Select all

alter ignore table tbl add unique index a_b_uniq(a, b);
Or, with more control over the order of insertion:

Code: Select all

create table tbl_new like tbl;
alter table tbl_new add unique index a_b_uniq (a,b);
insert ignore into tbl_new select * from tbl order by id asc;
rename table tbl to tbl_old, tbl_new to tbl;
drop table tbl_old;
-- optionally, drop unique index
alter table tbl drop index a_b_uniq;

Re: How do you find the second, of the two duplicates?

Posted: Mon Nov 05, 2012 3:01 pm
by requinix
You can also LEFT JOIN the table against itself on the same data but a different primary key, then keep only the rows where the joined data is null.

Code: Select all

SELECT a, b FROM tbl t1 LEFT JOIN tbl t2 ON t1.a = t2.a AND t1.b = t2.b AND t1.pk != t2.pk WHERE t2.pk IS NULL

Re: How do you find the second, of the two duplicates?

Posted: Tue Nov 06, 2012 3:24 pm
by pickle
If they're duplicates, why does it matter which one you delete?

Re: How do you find the second, of the two duplicates?

Posted: Wed Nov 07, 2012 2:03 am
by simonmlewis
Because I made the mistake of allowing as many as they want, and want only to keep the first one.
Tho you do have a point; if they don't realise the others have gone, why does it matter.