Is there a HTACCESS string that stops duplicate pages?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Is there a HTACCESS string that stops duplicate pages?

Post by simonmlewis »

We have URLs that load the category pages whether there is a / on the end or not.
We are told this is bad for SEO, as it is seen as a duplicate page.

This is our HTACCESS for the categories.
RewriteRule ^categ/([^/]+) /index.php?page=categ&cname=$1 [QSA]

Is there something simple I can add to this to only allow URLs with a / on the end?
If I add a /, then the page loads but doesn't load up the information from the URL.

Just wondering if the rewriterule should be different to stop those with a / being the same as those without.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Is there a HTACCESS string that stops duplicate pages?

Post by Celauran »

simonmlewis wrote:We have URLs that load the category pages whether there is a / on the end or not.
We are told this is bad for SEO, as it is seen as a duplicate page.
Flip side of that is that /category/t-shirts and /category/t-shirts/ going to different pages is bad for users. You could redirect one to the other, couldn't you?
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: Is there a HTACCESS string that stops duplicate pages?

Post by simonmlewis »

Both pages are the same in the content and layout. It's just SEO wise.
But if someone posts a link to /category/t-shirts/, and it loads the same page as the one without the slash, Google will cache both pages... which is bad as it's a duplicate.

So I don't know if there is a global overriding way in our HTACCESS of controlling it.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Is there a HTACCESS string that stops duplicate pages?

Post by Celauran »

Celauran wrote:You could redirect one to the other, couldn't you?
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: Is there a HTACCESS string that stops duplicate pages?

Post by simonmlewis »

http://stackoverflow.com/questions/1708 ... y-htaccess

The other issue we have is with double slashes.

IF the url has // after uk, and then just single / through the rest, it loads and doesn't remove the bad //s.

But if I put it in with // at the start after uk, and // thru the rest of the URL it does rewrite it.

Code: Select all

#remove double/more slashes in url
RewriteCond %{REQUEST_METHOD}  !=POST
RewriteCond %{REQUEST_URI} ^(.*?)(/{2,})(.*)$
RewriteRule . %1/%3 [R=301,L]
So it's not quite right.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Is there a HTACCESS string that stops duplicate pages?

Post by Celauran »

. matches any character, including /. You probably want to modify that first rule to exclude /
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: Is there a HTACCESS string that stops duplicate pages?

Post by simonmlewis »

Code: Select all

RewriteEngine On

RewriteCond %{THE_REQUEST} \s/+(.*?)/+(/\S*) [NC]
RewriteRule ^ %1%2 [R=302,L,NE]

RewriteCond %{REQUEST_URI} ^(.*)/{2,}(.*)$
RewriteRule . %1/%2 [R=301,L]
This is where I am up to. The first one works and replaces multiple //// after .uk, but doesn't work and throws a 403 error if you go to an internal page with //.

If I use only the second one, then the internal pages rewrite, but the first one doesn;t.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: Is there a HTACCESS string that stops duplicate pages?

Post by simonmlewis »

Code: Select all

# Remove multiple slashes after domain
RewriteCond %{HTTP_HOST} !=""
RewriteCond %{THE_REQUEST} ^[A-Z]+\s//+(.*)\sHTTP/[0-9.]+$ [OR]
RewriteCond %{THE_REQUEST} ^[A-Z]+\s(.*/)/+\sHTTP/[0-9.]+$
RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]

# Remove multiple slashes anywhere in URL
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]
This works for the double slashes. But I still don't know what to do about / on the end of a URL.

Is the answer to try and code it so it works ONLY with a / on the end and if Google caches pages without the slash, then manual 301s or is there a htaccess rule for each of our rules to spot them?
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: Is there a HTACCESS string that stops duplicate pages?

Post by simonmlewis »

We also have an issue where if at the end of a product url, you enter /hello, the page still loads. This is the product htaccess:

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Is there a HTACCESS string that stops duplicate pages?

Post by Celauran »

simonmlewis wrote:We also have an issue where if at the end of a product url, you enter /hello, the page still loads. This is the product htaccess:

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Toss a $ on the end of the rule, causing the extra /hello to not match the rule and fall through to other rules.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: Is there a HTACCESS string that stops duplicate pages?

Post by simonmlewis »

Where exactly do you mean?
If I do it like this:

Code: Select all

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Or like this:

Code: Select all

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)/$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
And put in a word after the final /, it still loads the page.

If I put it like this right at the end:

Code: Select all

RewriteRule ^categ/([^/]+)/([0-9]+) /index.php?page=categ&cname=$1&pagenum=$2 [QSA] $
It still all loads.

Any ideas?
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Is there a HTACCESS string that stops duplicate pages?

Post by Celauran »

simonmlewis wrote:Where exactly do you mean?
If I do it like this:

Code: Select all

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Or like this:

Code: Select all

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)/$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
And put in a word after the final /, it still loads the page.
Must be another rule catching it, then.

.htaccess
[text]RewriteEngine on

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [QSA,L][/text]

index.php

Code: Select all

<?php var_dump($_GET);
URI: /product/foo/bar/123/baz
[text]array (size=5)
'page' => string 'product' (length=7)
'cname' => string 'foo' (length=3)
'sname' => string 'bar' (length=3)
'product' => string '123' (length=3)
'h' => string 'baz' (length=3)[/text]

URI: /product/foo/bar/123/baz/hello
[text]array (size=0)
empty
[/text]
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: Is there a HTACCESS string that stops duplicate pages?

Post by simonmlewis »

This is our current live HTACCESS
There is a lot at the top to do with the double slashes.

Look in the NEW URL section.

Code: Select all

DirectoryIndex index.php index.html index.htm
order allow,deny
allow from all 
Options +FollowSymLinks
Options +Indexes
RewriteEngine On


# Remove multiple slashes after domain
RewriteCond %{HTTP_HOST} !=""
RewriteCond %{THE_REQUEST} ^[A-Z]+\s//+(.*)\sHTTP/[0-9.]+$ [OR]
RewriteCond %{THE_REQUEST} ^[A-Z]+\s(.*/)/+\sHTTP/[0-9.]+$
RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]

# Remove multiple slashes anywhere in URL
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]


RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]


RewriteRule ^(blog)($|/) - [QSA]


# old url rewrite
RewriteRule ^categ/([0-9]+)/([^/]+) /index.php?page=categ&c=$1&cname=$2 [QSA]
RewriteRule ^categ/page/([0-9]+)/([^/]+)/([0-9]+) /index.php?page=categ&c=$1&cname=$2&pagenum=$3 [QSA]
RewriteRule ^subcateg/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=subcateg&c=$1&cname=$2&s=$3&sname=$4&menu=sub [QSA]
RewriteRule ^subcateg/page/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+) /index.php?page=subcateg&c=$1&cname=$2&s=$3&sname=$4&pagenum=$5 [QSA]

RewriteRule ^product/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&c=$1&cname=$2&s=$3&sname=$4&product=$5&h=$6 [QSA]
# end of old url rewrite


# NEW URLS
RewriteRule ^categ/([^/]+)/([0-9]+) /index.php?page=categ&cname=$1&pagenum=$2 [QSA] 
RewriteRule ^categ/([^/]+) /index.php?page=categ&cname=$1 [QSA]

RewriteRule ^subcateg/([^/]+)/([^/]+)/([0-9]+) /index.php?page=subcateg&cname=$1&sname=$2&pagenum=$3 [QSA]
RewriteRule ^subcateg/([^/]+)/([^/]+) /index.php?page=subcateg&cname=$1&sname=$2 [QSA]

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
# END OF NEW URLS



RewriteRule ^knowledge/([0-9]+) /index.php?page=knowledge&id=$1 [QSA]
RewriteRule ^knowledge/answer/([0-9]+)/([0-9]+) /index.php?page=knowledge&id=$1&id_link=$2 [QSA]
RewriteRule ^pricedrop/page/([0-9]+)/ /index.php?page=pricedrop&pagenum=$1 [QSA]
RewriteRule ^productsnew/page/([0-9]+)/ /index.php?page=productsnew&pagenum=$1 [QSA]


RewriteRule ^productzoom/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=productzoom&c=$1&cname=$2&s=$3&sname=$4&product=$5&h=$6 [QSA]
RewriteRule ^loadout/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=loadout&c=$1&cname=$2&s=$3&sname=$4&product=$5&h=$6 [QSA]
RewriteRule ^pricematch/([0-9]+) /index.php?page=pricematch&id=$1 [QSA]

RewriteRule ^type/([^/]+) /index.php?page=type&type=$1 [QSA]
RewriteRule ^use/([^/]+) /index.php?page=use&use=$1 [QSA]

RewriteRule ^back-in-stock/page/([0-9]+)/([^/]+) /index.php?page=back-in-stock&pagenum=$1&power=$2 [L,QSA]

RewriteRule ^productsall/([0-9]+)/ /index.php?page=productsall/ [QSA]


RewriteRule ^productsall/page/([0-9]+)/ /index.php?page=productsall&pagenum=$1/ [QSA]
RewriteRule ^manufacturers/([^/]+) /index.php?page=manufacturers&manufacturer=$1 [QSA]
RewriteRule ^accessories-manufacturers/([^/]+) /index.php?page=accessories-manufacturers&manufacturer=$1 [QSA]
RewriteRule ^product-tags/page/([0-9]+)/([^/]+) /index.php?page=product-tags&pagenum=$1&producttag=$2 [QSA]
RewriteRule ^product-tags/([^/]+) /index.php?page=product-tags&producttag=$1 [QSA]
RewriteRule ^products-wrapped/page/([0-9]+)/ /index.php?page=products-wrapped&pagenum=$1/ [QSA]
RewriteRule ^videos/([0-9]+) /index.php?page=videos&catid=$1 [QSA]
RewriteRule ^videos/product/([0-9]+)/([0-9]+) /index.php?page=videos&catid=$1&id=$2 [QSA]
RewriteRule ^videos/product-search/([0-9]+)/([0-9]+)/([^/]+)/([^/]+) /index.php?page=videos&catid=$1&id=$2&search=$3&searchvideo=$4 [QSA]

RewriteRule ^([^/\.]+)/?$ index.php?page=$1 [QSA]

RewriteRule ^$ index.php?page=home [QSA]

RewriteRule ^robots.txt$ robots.php [QSA]
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Is there a HTACCESS string that stops duplicate pages?

Post by Celauran »

Adding /$ to the product rewrite rule results in a 404 if I add /hello to the URI. Shouldn't it?
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: Is there a HTACCESS string that stops duplicate pages?

Post by Celauran »

Also, this is a really long and convoluted .htaccess. You should really consider implementing some routing in PHP.
Post Reply