Page 1 of 1

Quick Line for Duplicates

Posted: Fri Jan 30, 2009 8:30 am
by SheDesigns
I have a mailing list of over 500 records, some of which, I know are duplicates. I don't have time to through an monitor them all. The table only has one field so far "address".

How would I write a query to find the entries that are duplicate?


SELECT * FROM `mailinglist` WHERE address ....

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 11:27 am
by infolock

Code: Select all

 
$sql = "SELECT distinct `address` from `mailinglist`";
 

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 12:21 pm
by SheDesigns
I need to find the ones that aren't distinct however. :|

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 12:56 pm
by Christopher
I think this will show you how many of each email you have.

Code: Select all

SELECT email,COUNT(email) FROM mailinglist GROUP BY email ORDER BY COUNT(email) DESC

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 2:15 pm
by pickle
I'd spend a little time sorting removing the duplicates manually. Just sort by the field - the duplicates should show up. Then you could set that column as 'UNIQUE' and not have to worry about this again.

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 2:29 pm
by SheDesigns
From now on I will have them unique, I got these addresses out of my client's AOL address book that's been sloppily upkept since 2001.

Thank you though.

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 3:11 pm
by Eran
Extending upon arborint's idea:

Code: Select all

SELECT email,COUNT(email) AS count FROM mailinglist GROUP BY email HAVING count > 1
Or an alternative, which may or may not run faster (depending on indexes):

Code: Select all

SELECT email FROM mailinglist WHERE email NOT IN (SELECT DISTINCT email FROM mailinglist AS m)

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 3:26 pm
by Benjamin
Great way to remove duplicates:

Code: Select all

 
CREATE TABLE email_temp AS SELECT * FROM mailinglist WHERE 1 GROUP BY email;
DROP TABLE mailinglist;
RENAME TABLE email_temp TO mailinglist;
 

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 5:59 pm
by VladSun
pytrin wrote:Or an alternative, which may or may not run faster (depending on indexes):

Code: Select all

SELECT email FROM mailinglist WHERE email NOT IN (SELECT DISTINCT email FROM mailinglist AS m)
This one will not work :) It will always give an empty row set.

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 6:02 pm
by VladSun
astions wrote:Great way to remove duplicates:

Code: Select all

 
CREATE TABLE email_temp AS SELECT * FROM mailinglist WHERE 1 GROUP BY email;
DROP TABLE mailinglist;
RENAME TABLE email_temp TO mailinglist;
 
Nice :)
But why "WHERE 1" must present in the qyery?

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 6:09 pm
by Christopher
pytrin wrote:Extending upon arborint's idea:

Code: Select all

SELECT email,COUNT(email) AS count FROM mailinglist GROUP BY email HAVING count > 1
I knew I left something out. ;) Using HAVING is necessary instead of WHERE because HAVING operates after the GROUP BY, whereas WHERE operates before the GROUP BY. Important to remember, and very handy. As ever pytrin...excellent!

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 6:14 pm
by Eran
Actually the having is because count is the result of an aggregate function and this can't be used in the where clause.
This one will not work :) It will always give an empty row set.
:lol: oops.. my bad

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 6:18 pm
by Benjamin
VladSun wrote:But why "WHERE 1" must present
Probably for the GROUP BY clause to work.

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 6:28 pm
by VladSun
astions wrote:
VladSun wrote:But why "WHERE 1" must present
Probably for the GROUP BY clause to work.
The GROUP BY clause does not interfere with the WHERE clause (or it presence or absence) - try it.

Re: Quick Line for Duplicates

Posted: Fri Jan 30, 2009 6:29 pm
by Benjamin
Cool, it should work either way then.