Page 1 of 1

Remove Duplicates

Posted: Fri Apr 28, 2006 11:02 pm
by s.dot
Is there a query I can run to remove duplicate rows from a database table? The rows would be identical except for the primary key id. I could write a script for it but I'm afraid I would mess it up.

Posted: Sat Apr 29, 2006 2:03 am
by Chris Corbyn

Code: Select all

mysql> create table example (id int auto_increment primary key, col1 varchar(10), col2 varchar(10), col3 varchar(10)); Query OK, 0 rows affected (0.03 sec)

mysql> insert into example (col1, col2, col3) values ('one', 'two', 'three'); Query OK, 1 row affected (0.00 sec)

mysql> insert into example (col1, col2, col3) values ('one', 'four', 'three'); Query OK, 1 row affected (0.00 sec)

mysql> insert into example (col1, col2, col3) values ('one', 'two', 'three'); Query OK, 1 row affected (0.00 sec)

mysql> insert into example (col1, col2, col3) values ('one', 'two', 'four'); Query OK, 1 row affected (0.00 sec)

mysql> insert into example (col1, col2, col3) values ('four', 'two', 'three'); Query OK, 1 row affected (0.00 sec)

mysql> select * from example;
+----+------+------+-------+
| id | col1 | col2 | col3  |
+----+------+------+-------+
|  1 | one  | two  | three |
|  2 | one  | four | three |
|  3 | one  | two  | three |
|  4 | one  | two  | four  |
|  5 | four | two  | three |
+----+------+------+-------+
5 rows in set (0.00 sec)

mysql> select distinct col1, col2, col3  from example;
+------+------+-------+
| col1 | col2 | col3  |
+------+------+-------+
| one  | two  | three |
| one  | four | three |
| one  | two  | four  |
| four | two  | three |
+------+------+-------+
4 rows in set (0.00 sec)

mysql> create table example_copy (id int auto_increment primary key, col1 varchar(10), col2 varchar(10), col3 varchar(10));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into example_copy (col1, col2, col3) select distinct col1, col2, col3 from example;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from example_copy;
+----+------+------+-------+
| id | col1 | col2 | col3  |
+----+------+------+-------+
|  1 | one  | two  | three |
|  2 | one  | four | three |
|  3 | one  | two  | four  |
|  4 | four | two  | three |
+----+------+------+-------+
4 rows in set (0.00 sec)

mysql>
I'd do it like above. Make a new table ready to copy the distinct rows into. Perform the insert..select and then drop the old table, and rename the new one ;)

Posted: Sat Apr 29, 2006 2:56 am
by s.dot
I think I might have a problem with this, because my mysql version doesn't support subqueries. but i'll give it a try and let you know the results.

Posted: Sat Apr 29, 2006 3:08 am
by s.dot
Worked perfectly. Thanks :)

I also went command line with this. go 1337 me. :lol:

Posted: Sat Apr 29, 2006 3:46 am
by Chris Corbyn
scottayy wrote:Worked perfectly. Thanks :)

I also went command line with this. go 1337 me. :lol:
Glad it worked well for you :)

Posted: Sat Apr 29, 2006 5:25 am
by Benjamin
If the table was rather large, is their a more efficient method to do this using a query? Or would PHP be the best way to do it...

Posted: Sat Apr 29, 2006 6:21 am
by Chris Corbyn
agtlewis wrote:If the table was rather large, is their a more efficient method to do this using a query? Or would PHP be the best way to do it...
I'm not aware of a better way.... using PHP would actually be slower AFAIK. The more you get mysql to do, the faster it would be.

Posted: Sat Apr 29, 2006 6:22 am
by timvw
Would you really want to transfer all the rows to php.. Process them .. And send them back? I don't see how this can be more efficient than handling it at the database self...

Posted: Sat Apr 29, 2006 6:48 am
by s.dot
I had written a script before using PHP to try to loop through the records, check for a dupe, then insert into the new record if it wasn't found.

Ended up my script was flawed and I lost a few thousand rows and had to restore a backup table.

Doing it this way was quick (like 1.7 seconds) and painless (no script writing!).