Page 1 of 2
Preventing content/url duplication?
Posted: Fri Mar 20, 2009 1:48 am
by JAB Creations
I found it a little surprising that I'm able to pull up the following content on both URL's...
viewtopic.php?f=6&t=94788&p=527853
viewtopic.php?f=6&p=527853&t=94788
...or perhaps I've applied the right idea to the wrong scenario? Do Google and other search engines consider that duplicated content or does this not apply to HTTP queries?
Re: Preventing content/url duplication?
Posted: Fri Mar 20, 2009 2:25 am
by Chris Corbyn
They're the same URL? All you've done is re-ordered the parameters in the query portion of the URL.
Re: Preventing content/url duplication?
Posted: Fri Mar 20, 2009 2:30 am
by JAB Creations
After I posted I came across this
duplicate content page on Google. I don't think this would count as duplicate content. It seems more like if you don't implement an Apache (via .htaccess) script to choose either
http://example.com/ or
http://www.example.com/ one would risk creating duplicated content in example.
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 6:16 pm
by Ambush Commander
I'm not sure if Google canonicalizes GET query strings, but an easy way to canonicalize URLs yourself is to check QUERY_STRING for the ordering of variables, and redirect the user to the "real" URL if necessary.
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 9:17 pm
by allspiritseve
There's also Google's new canonical tag that should help with these types of problems.
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 9:34 pm
by JAB Creations
Thanks allspiritseve I don't think I found that though I've been using the base element (which I use PHP to determine the base href based on the domain name such as my site or localhost) so every single link on my site uses www in example.
Ambush Commander, yeah I considered that though it seems to be a low priority if any. It is easy to explode the HTTP queries and ensure that each query is at a required point to make the URL valid. I do think Google would trust the internal site links over external site links at least to a reasonable extent.
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 9:47 pm
by Chris Corbyn
allspiritseve wrote:There's also Google's new canonical tag that should help with these types of problems.
We recently had a SEO meeting at work and I learned about this. Sounds like a useful concept.
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 9:51 pm
by allspiritseve
Chris Corbyn wrote:allspiritseve wrote:There's also Google's new canonical tag that should help with these types of problems.
We recently had a SEO meeting at work and I learned about this. Sounds like a useful concept.
Yeah, I guess Yahoo and MSN agreed to support the tag as well (at some point, dunno when)
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 9:59 pm
by JAB Creations
There's no such thing as a "tag" in XHTML, they're called elements.
Yahoo has supported the
class robots-nocontent. Now why the heck would I want to start using all sorts of retarded non-standard XHTML elements that don't exist in any established standards?

Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 10:01 pm
by JAB Creations
Oh it's a link element?
Code: Select all
<link href="http://example.com/page.html" rel="canonical" />
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 10:04 pm
by allspiritseve
JAB Creations wrote:There's no such thing as a "tag" in XHTML, they're called elements.
Uh... ok?
Whatever you call it, you use
Code: Select all
<link rel="canonical" href="http://www.google.com" />
which is a tag/element that already exists, so not necessarily "unstandard".
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 10:17 pm
by Chris Corbyn
Yeah it's perfectly valid and a <link> is exactly the right place for it.
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 10:26 pm
by JAB Creations
I'm very strict about standards. There was this poor guy in another thread who wasted hours upon hours because he was missing a quote. By using application/xhtml+xml while I wouldn't be made aware of low-priority validation errors like duplicate ID's a missing quote would break the page, give me an error message, and I'd have the problem solved in half a minute at most. So by holding ourselves to standards we ensure consistency...and we save a whole lot of time in the long term. So I make a huge effort to use the correct terminology. This means I also end up in a lot of unique situations asking questions so I try to leave bread crumbs for those who do a search using standards compliant terminology.
Google
http://google.com/support/webmasters/bi ... wer=139394
MSDN
http://blogs.msdn.com/webmaster/archive ... ssues.aspx
Yahoo
http://ysearchblog.com/2009/02/12/fight ... ur-quiver/
I can see how the canonical link element will be useful for HTTP query dependent pages such as my forums.
However good practices can generally avoid the issue. A few Apache scripts can make a world of difference and using the base XHTML element in example. All my site's anchors and images all add the base element as the first half of the URL and then adds the anchor|image's href attribute's value as the second half. Having a consistent way to name files, etc.
This is definitely something I'll end up implementing in the 29th version of my site. Thanks for the heads up guys.

Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 10:38 pm
by allspiritseve
JAB Creations wrote:I'm very strict about standards.
Being strict about standards is fine... colloquial language has it's place too, though. Even W3C uses the word tag:
W3C wrote:Essentially this means that all elements must either have closing tags or be written in a special form
JAB Creations wrote:I can see how the canonical link element will be useful for HTTP query dependent pages such as my forums.
Duplicate content happens in other situations as well, even with clean urls (for instance, /blog/latest/post-name/ and /blog/archives/post-name) to make up a trivial example.
JAB Creations wrote:However good practices can generally avoid the issue. A few Apache scripts can make a world of difference and using the base XHTML element in example. All my site's anchors and images all add the base element as the first half of the URL and then adds the anchor|image's href attribute's value as the second half. Having a consistent way to name files, etc.
I don't see how the base tag solves the issue. Can you elaborate?
Re: Preventing content/url duplication?
Posted: Tue Mar 24, 2009 11:16 pm
by JAB Creations
The base element rocks for actually a couple of reasons.
First off it makes running the same site both locally and live a snap! I have two PHP class variables (base1 and base2).
For example my current project has the following values for localhost...
base1 =
http://localhost
base2 = /Version%202.9.A.3/
In a live environment it will end up being...
base1 =
http://www.example.com
base2 = /
Now take an anchor or image element's href attribute's value...
images/logo.gif
Well ignoring PHP and looking directly at the XHTML output you'll simply add the address up as so...
base1.base2.img.src
So...
http://www.example.com/images/logo.gif
The only time I use absolute URL's is when I link externally.
But any way the base element is most useful to me for being able to run the same site in any environment regardless of the various file paths. By using these practices my site's URL's are pretty clean.
I've been having a lot of fun building my site's new CMS system and it's pretty nice timing to hear about this as it'll be a snap to implement. My site has a new PHP CMS class that handles file paths including the base element and what HTTP code I should include in the headers (which thankfully Apache now logs to the server access log). So if the page is 304 or 200 I'll serve the canonical link element however if it's not a 304 or 200 then I change the robots meta element to "NOINDEX, NOFOLLOW".