Preventing content/url duplication?
Moderator: General Moderators
- JAB Creations
- DevNet Resident
- Posts: 2341
- Joined: Thu Jan 13, 2005 6:44 pm
- Location: Sarasota Florida
- Contact:
Preventing content/url duplication?
I found it a little surprising that I'm able to pull up the following content on both URL's...
viewtopic.php?f=6&t=94788&p=527853
viewtopic.php?f=6&p=527853&t=94788
...or perhaps I've applied the right idea to the wrong scenario? Do Google and other search engines consider that duplicated content or does this not apply to HTTP queries?
viewtopic.php?f=6&t=94788&p=527853
viewtopic.php?f=6&p=527853&t=94788
...or perhaps I've applied the right idea to the wrong scenario? Do Google and other search engines consider that duplicated content or does this not apply to HTTP queries?
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: Preventing content/url duplication?
They're the same URL? All you've done is re-ordered the parameters in the query portion of the URL.
- JAB Creations
- DevNet Resident
- Posts: 2341
- Joined: Thu Jan 13, 2005 6:44 pm
- Location: Sarasota Florida
- Contact:
Re: Preventing content/url duplication?
After I posted I came across this duplicate content page on Google. I don't think this would count as duplicate content. It seems more like if you don't implement an Apache (via .htaccess) script to choose either http://example.com/ or http://www.example.com/ one would risk creating duplicated content in example.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Re: Preventing content/url duplication?
I'm not sure if Google canonicalizes GET query strings, but an easy way to canonicalize URLs yourself is to check QUERY_STRING for the ordering of variables, and redirect the user to the "real" URL if necessary.
- allspiritseve
- DevNet Resident
- Posts: 1174
- Joined: Thu Mar 06, 2008 8:23 am
- Location: Ann Arbor, MI (USA)
Re: Preventing content/url duplication?
There's also Google's new canonical tag that should help with these types of problems.
- JAB Creations
- DevNet Resident
- Posts: 2341
- Joined: Thu Jan 13, 2005 6:44 pm
- Location: Sarasota Florida
- Contact:
Re: Preventing content/url duplication?
Thanks allspiritseve I don't think I found that though I've been using the base element (which I use PHP to determine the base href based on the domain name such as my site or localhost) so every single link on my site uses www in example.
Ambush Commander, yeah I considered that though it seems to be a low priority if any. It is easy to explode the HTTP queries and ensure that each query is at a required point to make the URL valid. I do think Google would trust the internal site links over external site links at least to a reasonable extent.
Ambush Commander, yeah I considered that though it seems to be a low priority if any. It is easy to explode the HTTP queries and ensure that each query is at a required point to make the URL valid. I do think Google would trust the internal site links over external site links at least to a reasonable extent.
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: Preventing content/url duplication?
We recently had a SEO meeting at work and I learned about this. Sounds like a useful concept.allspiritseve wrote:There's also Google's new canonical tag that should help with these types of problems.
- allspiritseve
- DevNet Resident
- Posts: 1174
- Joined: Thu Mar 06, 2008 8:23 am
- Location: Ann Arbor, MI (USA)
Re: Preventing content/url duplication?
Yeah, I guess Yahoo and MSN agreed to support the tag as well (at some point, dunno when)Chris Corbyn wrote:We recently had a SEO meeting at work and I learned about this. Sounds like a useful concept.allspiritseve wrote:There's also Google's new canonical tag that should help with these types of problems.
- JAB Creations
- DevNet Resident
- Posts: 2341
- Joined: Thu Jan 13, 2005 6:44 pm
- Location: Sarasota Florida
- Contact:
Re: Preventing content/url duplication?
There's no such thing as a "tag" in XHTML, they're called elements.
Yahoo has supported the class robots-nocontent. Now why the heck would I want to start using all sorts of retarded non-standard XHTML elements that don't exist in any established standards?
Yahoo has supported the class robots-nocontent. Now why the heck would I want to start using all sorts of retarded non-standard XHTML elements that don't exist in any established standards?
- JAB Creations
- DevNet Resident
- Posts: 2341
- Joined: Thu Jan 13, 2005 6:44 pm
- Location: Sarasota Florida
- Contact:
Re: Preventing content/url duplication?
Oh it's a link element?
Code: Select all
<link href="http://example.com/page.html" rel="canonical" />- allspiritseve
- DevNet Resident
- Posts: 1174
- Joined: Thu Mar 06, 2008 8:23 am
- Location: Ann Arbor, MI (USA)
Re: Preventing content/url duplication?
Uh... ok?JAB Creations wrote:There's no such thing as a "tag" in XHTML, they're called elements.
Whatever you call it, you use
Code: Select all
<link rel="canonical" href="http://www.google.com" />- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: Preventing content/url duplication?
Yeah it's perfectly valid and a <link> is exactly the right place for it.
- JAB Creations
- DevNet Resident
- Posts: 2341
- Joined: Thu Jan 13, 2005 6:44 pm
- Location: Sarasota Florida
- Contact:
Re: Preventing content/url duplication?
I'm very strict about standards. There was this poor guy in another thread who wasted hours upon hours because he was missing a quote. By using application/xhtml+xml while I wouldn't be made aware of low-priority validation errors like duplicate ID's a missing quote would break the page, give me an error message, and I'd have the problem solved in half a minute at most. So by holding ourselves to standards we ensure consistency...and we save a whole lot of time in the long term. So I make a huge effort to use the correct terminology. This means I also end up in a lot of unique situations asking questions so I try to leave bread crumbs for those who do a search using standards compliant terminology.
Google
http://google.com/support/webmasters/bi ... wer=139394
MSDN
http://blogs.msdn.com/webmaster/archive ... ssues.aspx
Yahoo
http://ysearchblog.com/2009/02/12/fight ... ur-quiver/
I can see how the canonical link element will be useful for HTTP query dependent pages such as my forums.
However good practices can generally avoid the issue. A few Apache scripts can make a world of difference and using the base XHTML element in example. All my site's anchors and images all add the base element as the first half of the URL and then adds the anchor|image's href attribute's value as the second half. Having a consistent way to name files, etc.
This is definitely something I'll end up implementing in the 29th version of my site. Thanks for the heads up guys.
http://google.com/support/webmasters/bi ... wer=139394
MSDN
http://blogs.msdn.com/webmaster/archive ... ssues.aspx
Yahoo
http://ysearchblog.com/2009/02/12/fight ... ur-quiver/
I can see how the canonical link element will be useful for HTTP query dependent pages such as my forums.
However good practices can generally avoid the issue. A few Apache scripts can make a world of difference and using the base XHTML element in example. All my site's anchors and images all add the base element as the first half of the URL and then adds the anchor|image's href attribute's value as the second half. Having a consistent way to name files, etc.
This is definitely something I'll end up implementing in the 29th version of my site. Thanks for the heads up guys.
- allspiritseve
- DevNet Resident
- Posts: 1174
- Joined: Thu Mar 06, 2008 8:23 am
- Location: Ann Arbor, MI (USA)
Re: Preventing content/url duplication?
Being strict about standards is fine... colloquial language has it's place too, though. Even W3C uses the word tag:JAB Creations wrote:I'm very strict about standards.
W3C wrote:Essentially this means that all elements must either have closing tags or be written in a special form
Duplicate content happens in other situations as well, even with clean urls (for instance, /blog/latest/post-name/ and /blog/archives/post-name) to make up a trivial example.JAB Creations wrote:I can see how the canonical link element will be useful for HTTP query dependent pages such as my forums.
I don't see how the base tag solves the issue. Can you elaborate?JAB Creations wrote:However good practices can generally avoid the issue. A few Apache scripts can make a world of difference and using the base XHTML element in example. All my site's anchors and images all add the base element as the first half of the URL and then adds the anchor|image's href attribute's value as the second half. Having a consistent way to name files, etc.
- JAB Creations
- DevNet Resident
- Posts: 2341
- Joined: Thu Jan 13, 2005 6:44 pm
- Location: Sarasota Florida
- Contact:
Re: Preventing content/url duplication?
The base element rocks for actually a couple of reasons.
First off it makes running the same site both locally and live a snap! I have two PHP class variables (base1 and base2).
For example my current project has the following values for localhost...
base1 = http://localhost
base2 = /Version%202.9.A.3/
In a live environment it will end up being...
base1 = http://www.example.com
base2 = /
Now take an anchor or image element's href attribute's value...
images/logo.gif
Well ignoring PHP and looking directly at the XHTML output you'll simply add the address up as so...
base1.base2.img.src
So...
http://www.example.com/images/logo.gif
The only time I use absolute URL's is when I link externally.
But any way the base element is most useful to me for being able to run the same site in any environment regardless of the various file paths. By using these practices my site's URL's are pretty clean.
I've been having a lot of fun building my site's new CMS system and it's pretty nice timing to hear about this as it'll be a snap to implement. My site has a new PHP CMS class that handles file paths including the base element and what HTTP code I should include in the headers (which thankfully Apache now logs to the server access log). So if the page is 304 or 200 I'll serve the canonical link element however if it's not a 304 or 200 then I change the robots meta element to "NOINDEX, NOFOLLOW".
First off it makes running the same site both locally and live a snap! I have two PHP class variables (base1 and base2).
For example my current project has the following values for localhost...
base1 = http://localhost
base2 = /Version%202.9.A.3/
In a live environment it will end up being...
base1 = http://www.example.com
base2 = /
Now take an anchor or image element's href attribute's value...
images/logo.gif
Well ignoring PHP and looking directly at the XHTML output you'll simply add the address up as so...
base1.base2.img.src
So...
http://www.example.com/images/logo.gif
The only time I use absolute URL's is when I link externally.
But any way the base element is most useful to me for being able to run the same site in any environment regardless of the various file paths. By using these practices my site's URL's are pretty clean.
I've been having a lot of fun building my site's new CMS system and it's pretty nice timing to hear about this as it'll be a snap to implement. My site has a new PHP CMS class that handles file paths including the base element and what HTTP code I should include in the headers (which thankfully Apache now logs to the server access log). So if the page is 304 or 200 I'll serve the canonical link element however if it's not a 304 or 200 then I change the robots meta element to "NOINDEX, NOFOLLOW".