I have a database of over 100,000 records.
I want to flag dupes based on latitude and longitude.
I'm thinking of writing a script to loop through all records starting from the 1st; I would compare the latitude and longitude of the 1st record to every other record and do that for all records. This doesn't seem terribly efficient.
Is there a faster way to do this?
Perhaps sort by latitude AND longitude?
Fastest query to select dupes from two columns?
Moderator: General Moderators
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
Code: Select all
SELECT *, COUNT(`id`) `DUPLICATES` FROM `table` GROUP BY `long`, `lat` HAVING `DUPLICATES` > 1slightly twaeking feyd's query to make it even faster
since we only need to find duplicates.. and in using *; the database internally has to identify column names & column count, also there would be overhead involved in retrieving multiple columns & i believe they are are not required in context of this issue.
Code: Select all
SELECT `id`, COUNT(`id`) `DUPLICATES` FROM `table` GROUP BY `id` HAVING `DUPLICATES` > 1