[SOLVED] removing duplicate records..

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

waskelton4
Forum Contributor
Posts: 132
Joined: Mon Sep 09, 2002 6:42 pm

[SOLVED] removing duplicate records..

Post by waskelton4 »

Hello all,

I have a database that i've accidentally created duplicate records in. even down to the timestamp field. there are probably 3000 so it's too much to weed out by hand. Is there a way for me to delete the duplicates without deleting both rows?

thanks
Will
Last edited by waskelton4 on Fri Aug 26, 2005 10:14 am, edited 1 time in total.
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

What database are you using?

There a large number of possible ways to fix this, but most require implementation-dependent workarounds.
waskelton4
Forum Contributor
Posts: 132
Joined: Mon Sep 09, 2002 6:42 pm

Post by waskelton4 »

Whoops..

mysql ver 4.1
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

I don't know MySQL, typically the tricks rely on the database having a hidden "row identifier" -- sort of an extra/hidden primary key -- that you can use to group by/min(row_id) to delete all by the lowest row_id match.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Post by josh »

using php you could select all the primary keys, and whatever value you want to check for duplicates, then build an array of records to delete then

Code: Select all

delete from table where `key` in (1,5,100,400,700)
like that
waskelton4
Forum Contributor
Posts: 132
Joined: Mon Sep 09, 2002 6:42 pm

Post by waskelton4 »

jshpro2 wrote:using php you could select all the primary keys, and whatever value you want to check for duplicates, then build an array of records to delete then

Code: Select all

delete from table where `key` in (1,5,100,400,700)
like that
But wouldn't that delete both of the records and not just one of the copies? The table doesn't have an enforced primary key currently. (i know.. real bad) after i get all the duplicate records out I'm gonna create it.

will
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

you should be able to find duplicates like so:

Code: Select all

SELECT t1.*, COUNT(t1.`someField`) AS siblings FROM `table1` t1 NATURAL LEFT JOIN `table1` t2 GROUP BY `someField1`, `someField2` HAVING siblings > 1

(untested)
waskelton4
Forum Contributor
Posts: 132
Joined: Mon Sep 09, 2002 6:42 pm

Post by waskelton4 »

feyd wrote:you should be able to find duplicates like so:

Code: Select all

SELECT t1.*, COUNT(t1.`someField`) AS siblings FROM `table1` t1 NATURAL LEFT JOIN `table1` t2 GROUP BY `someField1`, `someField2` HAVING siblings > 1
(untested)
Hrmm..

I tried that above and the value of 'siblings' came out to be the square of what the value should have been. (i.e. all the 2s were 4s and the 3 was 9)

Would there be a solution similar to something like this..

Code: Select all

SELECT ID, Count(ID) as counter FROM `table` group by ID where counter > 1
or

Code: Select all

SELECT ID, Count(ID) as counter FROM `table` group by ID where count(ID) > 1
Those both throw errors.. but it seems to me that they should/would work.

Thanks for the help all..

will
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

oops, change the > 1 to > 2. Yes they will be squares, because the join attaches to all instances of the duplicate for each one that exists ;)
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

If there's a unique bit of information such as an email address in the data just select everything, but group by that field. It'll then select one of the two:

delete from table where id in (select id from table group by email)

That's not tested by the way.
AGISB
Forum Contributor
Posts: 422
Joined: Fri Jul 09, 2004 1:23 am

Post by AGISB »

Just read all the entries and put it in an array. Filter out all duplicates and write it to a new table.

Empty the whole table and rewrite it with the other tables data. Could be by SQL or PHP
waskelton4
Forum Contributor
Posts: 132
Joined: Mon Sep 09, 2002 6:42 pm

Post by waskelton4 »

AGISB wrote:Just read all the entries and put it in an array. Filter out all duplicates and write it to a new table.

Empty the whole table and rewrite it with the other tables data. Could be by SQL or PHP
I've thought of similar ways but I was hoping to do it with just SQL...

I think I will end up doing something similar to this.

Thanks for everyone's help!

will
waskelton4
Forum Contributor
Posts: 132
Joined: Mon Sep 09, 2002 6:42 pm

Post by waskelton4 »

feyd wrote:oops, change the > 1 to > 2. Yes they will be squares, because the join attaches to all instances of the duplicate for each one that exists ;)
Actually I think that > 1 works fine because the square of 1 is still 1.. so the lowest value returned is for the rows with one duplicate (2 rows) so that value is 4.

Thanks again for the help all..

Now I just have to shut down the application for a few minutes while I remove these things.. blech..

ws
bg
Forum Contributor
Posts: 157
Joined: Fri Sep 12, 2003 11:01 am

Post by bg »

Use "LIMIT 1" so that only one record is being deleted (assuming there is only one duplicate for each entry)
waskelton4
Forum Contributor
Posts: 132
Joined: Mon Sep 09, 2002 6:42 pm

Post by waskelton4 »

bgzee wrote:Use "LIMIT 1" so that only one record is being deleted (assuming there is only one duplicate for each entry)
thats the ticket....
Post Reply