Remove Duplicates

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Remove Duplicates

Post by s.dot »

Is there a query I can run to remove duplicate rows from a database table? The rows would be identical except for the primary key id. I could write a script for it but I'm afraid I would mess it up.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Code: Select all

mysql> create table example (id int auto_increment primary key, col1 varchar(10), col2 varchar(10), col3 varchar(10)); Query OK, 0 rows affected (0.03 sec)

mysql> insert into example (col1, col2, col3) values ('one', 'two', 'three'); Query OK, 1 row affected (0.00 sec)

mysql> insert into example (col1, col2, col3) values ('one', 'four', 'three'); Query OK, 1 row affected (0.00 sec)

mysql> insert into example (col1, col2, col3) values ('one', 'two', 'three'); Query OK, 1 row affected (0.00 sec)

mysql> insert into example (col1, col2, col3) values ('one', 'two', 'four'); Query OK, 1 row affected (0.00 sec)

mysql> insert into example (col1, col2, col3) values ('four', 'two', 'three'); Query OK, 1 row affected (0.00 sec)

mysql> select * from example;
+----+------+------+-------+
| id | col1 | col2 | col3  |
+----+------+------+-------+
|  1 | one  | two  | three |
|  2 | one  | four | three |
|  3 | one  | two  | three |
|  4 | one  | two  | four  |
|  5 | four | two  | three |
+----+------+------+-------+
5 rows in set (0.00 sec)

mysql> select distinct col1, col2, col3  from example;
+------+------+-------+
| col1 | col2 | col3  |
+------+------+-------+
| one  | two  | three |
| one  | four | three |
| one  | two  | four  |
| four | two  | three |
+------+------+-------+
4 rows in set (0.00 sec)

mysql> create table example_copy (id int auto_increment primary key, col1 varchar(10), col2 varchar(10), col3 varchar(10));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into example_copy (col1, col2, col3) select distinct col1, col2, col3 from example;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from example_copy;
+----+------+------+-------+
| id | col1 | col2 | col3  |
+----+------+------+-------+
|  1 | one  | two  | three |
|  2 | one  | four | three |
|  3 | one  | two  | four  |
|  4 | four | two  | three |
+----+------+------+-------+
4 rows in set (0.00 sec)

mysql>
I'd do it like above. Make a new table ready to copy the distinct rows into. Perform the insert..select and then drop the old table, and rename the new one ;)
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

I think I might have a problem with this, because my mysql version doesn't support subqueries. but i'll give it a try and let you know the results.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Worked perfectly. Thanks :)

I also went command line with this. go 1337 me. :lol:
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

scottayy wrote:Worked perfectly. Thanks :)

I also went command line with this. go 1337 me. :lol:
Glad it worked well for you :)
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

If the table was rather large, is their a more efficient method to do this using a query? Or would PHP be the best way to do it...
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

agtlewis wrote:If the table was rather large, is their a more efficient method to do this using a query? Or would PHP be the best way to do it...
I'm not aware of a better way.... using PHP would actually be slower AFAIK. The more you get mysql to do, the faster it would be.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

Would you really want to transfer all the rows to php.. Process them .. And send them back? I don't see how this can be more efficient than handling it at the database self...
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

I had written a script before using PHP to try to loop through the records, check for a dupe, then insert into the new record if it wasn't found.

Ended up my script was flawed and I lost a few thousand rows and had to restore a backup table.

Doing it this way was quick (like 1.7 seconds) and painless (no script writing!).
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
Post Reply