How do I check for duplicates, without extracting everything

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

How do I check for duplicates, without extracting everything

Post by simonmlewis »

Code: Select all

$result = mysql_query ("SELECT id, categories, titletag, metadescription, metakeywords FROM categories WHERE titletag = '' OR titletag IS NULL OR metadescription = '' OR metadescription IS NULL OR metakeywords = '' OR metakeywords IS NULL");
This is my code to check for missing information.
But I now need to show if the information in these fields are duplicates... and of what they are duplicates.

ie. If Category "Shirts" has the same Meta Description as "Polo Shirts", it needs to flag it up.

I could easily just extract the lot, and then query each line in a separate query... and show. But then it would be doing more work than needed, as it would show each category, even if there are no duplicates.

(this will be done for products too... so you can see my problem).

Can it all be done within one query?
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: How do I check for duplicates, without extracting everyt

Post by Christopher »

GROUP BY the field you want to find duplicates in, and then check if COUNT() is > 1.
(#10850)
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: How do I check for duplicates, without extracting everyt

Post by simonmlewis »

Sorry.... how?
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: How do I check for duplicates, without extracting everyt

Post by Celauran »

Code: Select all

SELECT metadescription, COUNT(id) AS instances
FROM categories
GROUP BY metadescription
ORDER BY instances DESC
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: How do I check for duplicates, without extracting everyt

Post by simonmlewis »

Goodness me - never seen that code before, and certainly not "instances". That's a new one on me. I'll try it. Thanks.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: How do I check for duplicates, without extracting everyt

Post by Celauran »

'instances' has no special meaning. It's just an alias. You could call it anything you like.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: How do I check for duplicates, without extracting everyt

Post by simonmlewis »

Ohhhh {idiot} yes of course. "AS..... " is just naming it into a variable.

Durr!!
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: How do I check for duplicates, without extracting everyt

Post by simonmlewis »

I'm now looking at this, and I don't quite get it.

I need to show each row for a category if there are duplicates of that category.
Then, if there are, to state at least one category that the meta is duplicated with.

I think I perhaps have to accept that I need the system to go thru every row, as I don't think it's possible to query a row, only if it has a duplicate... as it need to find that row in order to do so!

But it would be better if the browser didn't have to echo the row (that has no dupes of missing) on screen. Saves a bit of PC power.

But without first querying the row, and asking it "is your titletag used by any other row in the database.".. and the same for metadesecrip and metakey, I don't see a good way thru this.

The result I want is something like this:

XL Shirt Black - <title tag> duplicated with XL Shirt White
XL Shirt Black - <Meta Desc> duplicated with XL Shirt White
L tshirt - <title tag> missing.

That kind of thing.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: How do I check for duplicates, without extracting everyt

Post by Celauran »

So start with the SQL I posted earlier, iterate over the results and build a list of duplicate tags (ie. those with instances > 1). You can then build a new query to return the records matching that tag. Repeat for each duplicated instance of each tag. It's a bit cumbersome, but should do precisely what you need.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: How do I check for duplicates, without extracting everyt

Post by simonmlewis »

Code: Select all

$result = mysql_query ("
SELECT COUNT(*), id,
    titletag, metadescription, metakeywords, title, id, catname
FROM products
GROUP BY titletag
HAVING COUNT(*) > 1 ORDER BY title") or die(mysql_error());
This is my code now.
It has one problem.
Let's say XL SHIRTS has the same titletag etc, as L SHIRTS.
So you might expect there to be two rows shows in the results?

XL SHIRTS > L SHIRTS
L SHIRTS > XL SHIRTS

If not.. why not? Give you a low.... only the first row shows. But if one is a duplicate of the other, you essentially have two duplicates.

A is the same as B.
And B is of course the same as A.

I only really spotted this because A has the correct METAs, so I need to change B. But can't, because I am only showing a popup edit screen for A. lol.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: How do I check for duplicates, without extracting everyt

Post by Celauran »

You're aggregating your results, so this is the expected behaviour. You have the titletag that has duplicates, though, so you can now query against that to get the rows in question.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: How do I check for duplicates, without extracting everyt

Post by simonmlewis »

Sorry I'm not with you.
What do you mean by "aggregating results"?
If it's found A to be duplicated with B, surely it show me showing it the other way around as well?
Why isn't it?
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: How do I check for duplicates, without extracting everyt

Post by simonmlewis »

Code: Select all

GROUP BY titletag
Gotchya. I see now.

So without it making painful queries and taking longer as it did before, how do I do it?
Because A is using the correct tags, but B should be changed.

... Unless I give the option to change both....
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: How do I check for duplicates, without extracting everyt

Post by Celauran »

I'm assuming you'll need human eyes on this, at least to decide which of the two (or more) gets updated and which remains the same. I'd probably write a function to grab matches for each tag, get the elements with duplicated tags, and then display a list of some sort with links to the appropriate edit pages.
Post Reply