Page 1 of 1

Fastest query to select dupes from two columns?

Posted: Thu Sep 29, 2005 6:41 pm
by voltrader
I have a database of over 100,000 records.

I want to flag dupes based on latitude and longitude.

I'm thinking of writing a script to loop through all records starting from the 1st; I would compare the latitude and longitude of the 1st record to every other record and do that for all records. This doesn't seem terribly efficient.

Is there a faster way to do this?

Perhaps sort by latitude AND longitude?

Posted: Thu Sep 29, 2005 7:18 pm
by feyd

Code: Select all

SELECT *, COUNT(`id`) `DUPLICATES` FROM `table` GROUP BY `long`, `lat` HAVING `DUPLICATES` > 1
I believe

Posted: Fri Sep 30, 2005 12:41 am
by ruchit
slightly twaeking feyd's query to make it even faster

Code: Select all

SELECT `id`, COUNT(`id`) `DUPLICATES` FROM `table` GROUP BY `id` HAVING `DUPLICATES` > 1
since we only need to find duplicates.. and in using *; the database internally has to identify column names & column count, also there would be overhead involved in retrieving multiple columns & i believe they are are not required in context of this issue.

Posted: Fri Sep 30, 2005 6:32 pm
by ryanlwh
that would work only if the duplicate records has the same ids... i believe he's asking about finding dupes of lat and long. so i guess feyd was right.