Page 1 of 1
Remove Duplicates
Posted: Fri Apr 28, 2006 11:02 pm
by s.dot
Is there a query I can run to remove duplicate rows from a database table? The rows would be identical except for the primary key id. I could write a script for it but I'm afraid I would mess it up.
Posted: Sat Apr 29, 2006 2:03 am
by Chris Corbyn
Code: Select all
mysql> create table example (id int auto_increment primary key, col1 varchar(10), col2 varchar(10), col3 varchar(10)); Query OK, 0 rows affected (0.03 sec)
mysql> insert into example (col1, col2, col3) values ('one', 'two', 'three'); Query OK, 1 row affected (0.00 sec)
mysql> insert into example (col1, col2, col3) values ('one', 'four', 'three'); Query OK, 1 row affected (0.00 sec)
mysql> insert into example (col1, col2, col3) values ('one', 'two', 'three'); Query OK, 1 row affected (0.00 sec)
mysql> insert into example (col1, col2, col3) values ('one', 'two', 'four'); Query OK, 1 row affected (0.00 sec)
mysql> insert into example (col1, col2, col3) values ('four', 'two', 'three'); Query OK, 1 row affected (0.00 sec)
mysql> select * from example;
+----+------+------+-------+
| id | col1 | col2 | col3 |
+----+------+------+-------+
| 1 | one | two | three |
| 2 | one | four | three |
| 3 | one | two | three |
| 4 | one | two | four |
| 5 | four | two | three |
+----+------+------+-------+
5 rows in set (0.00 sec)
mysql> select distinct col1, col2, col3 from example;
+------+------+-------+
| col1 | col2 | col3 |
+------+------+-------+
| one | two | three |
| one | four | three |
| one | two | four |
| four | two | three |
+------+------+-------+
4 rows in set (0.00 sec)
mysql> create table example_copy (id int auto_increment primary key, col1 varchar(10), col2 varchar(10), col3 varchar(10));
Query OK, 0 rows affected (0.02 sec)
mysql> insert into example_copy (col1, col2, col3) select distinct col1, col2, col3 from example;
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> select * from example_copy;
+----+------+------+-------+
| id | col1 | col2 | col3 |
+----+------+------+-------+
| 1 | one | two | three |
| 2 | one | four | three |
| 3 | one | two | four |
| 4 | four | two | three |
+----+------+------+-------+
4 rows in set (0.00 sec)
mysql>
I'd do it like above. Make a new table ready to copy the distinct rows into. Perform the insert..select and then drop the old table, and rename the new one

Posted: Sat Apr 29, 2006 2:56 am
by s.dot
I think I might have a problem with this, because my mysql version doesn't support subqueries. but i'll give it a try and let you know the results.
Posted: Sat Apr 29, 2006 3:08 am
by s.dot
Worked perfectly. Thanks
I also went command line with this. go 1337 me.

Posted: Sat Apr 29, 2006 3:46 am
by Chris Corbyn
scottayy wrote:Worked perfectly. Thanks
I also went command line with this. go 1337 me.

Glad it worked well for you

Posted: Sat Apr 29, 2006 5:25 am
by Benjamin
If the table was rather large, is their a more efficient method to do this using a query? Or would PHP be the best way to do it...
Posted: Sat Apr 29, 2006 6:21 am
by Chris Corbyn
agtlewis wrote:If the table was rather large, is their a more efficient method to do this using a query? Or would PHP be the best way to do it...
I'm not aware of a better way.... using PHP would actually be slower AFAIK. The more you get mysql to do, the faster it would be.
Posted: Sat Apr 29, 2006 6:22 am
by timvw
Would you really want to transfer all the rows to php.. Process them .. And send them back? I don't see how this can be more efficient than handling it at the database self...
Posted: Sat Apr 29, 2006 6:48 am
by s.dot
I had written a script before using PHP to try to loop through the records, check for a dupe, then insert into the new record if it wasn't found.
Ended up my script was flawed and I lost a few thousand rows and had to restore a backup table.
Doing it this way was quick (like 1.7 seconds) and painless (no script writing!).