Page 1 of 3

Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 8:46 am
by simonmlewis
We have URLs that load the category pages whether there is a / on the end or not.
We are told this is bad for SEO, as it is seen as a duplicate page.

This is our HTACCESS for the categories.
RewriteRule ^categ/([^/]+) /index.php?page=categ&cname=$1 [QSA]

Is there something simple I can add to this to only allow URLs with a / on the end?
If I add a /, then the page loads but doesn't load up the information from the URL.

Just wondering if the rewriterule should be different to stop those with a / being the same as those without.

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 8:53 am
by Celauran
simonmlewis wrote:We have URLs that load the category pages whether there is a / on the end or not.
We are told this is bad for SEO, as it is seen as a duplicate page.
Flip side of that is that /category/t-shirts and /category/t-shirts/ going to different pages is bad for users. You could redirect one to the other, couldn't you?

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 9:08 am
by simonmlewis
Both pages are the same in the content and layout. It's just SEO wise.
But if someone posts a link to /category/t-shirts/, and it loads the same page as the one without the slash, Google will cache both pages... which is bad as it's a duplicate.

So I don't know if there is a global overriding way in our HTACCESS of controlling it.

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 9:12 am
by Celauran
Celauran wrote:You could redirect one to the other, couldn't you?

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 9:16 am
by simonmlewis
http://stackoverflow.com/questions/1708 ... y-htaccess

The other issue we have is with double slashes.

IF the url has // after uk, and then just single / through the rest, it loads and doesn't remove the bad //s.

But if I put it in with // at the start after uk, and // thru the rest of the URL it does rewrite it.

Code: Select all

#remove double/more slashes in url
RewriteCond %{REQUEST_METHOD}  !=POST
RewriteCond %{REQUEST_URI} ^(.*?)(/{2,})(.*)$
RewriteRule . %1/%3 [R=301,L]
So it's not quite right.

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 9:23 am
by Celauran
. matches any character, including /. You probably want to modify that first rule to exclude /

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 9:32 am
by simonmlewis

Code: Select all

RewriteEngine On

RewriteCond %{THE_REQUEST} \s/+(.*?)/+(/\S*) [NC]
RewriteRule ^ %1%2 [R=302,L,NE]

RewriteCond %{REQUEST_URI} ^(.*)/{2,}(.*)$
RewriteRule . %1/%2 [R=301,L]
This is where I am up to. The first one works and replaces multiple //// after .uk, but doesn't work and throws a 403 error if you go to an internal page with //.

If I use only the second one, then the internal pages rewrite, but the first one doesn;t.

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 9:50 am
by simonmlewis

Code: Select all

# Remove multiple slashes after domain
RewriteCond %{HTTP_HOST} !=""
RewriteCond %{THE_REQUEST} ^[A-Z]+\s//+(.*)\sHTTP/[0-9.]+$ [OR]
RewriteCond %{THE_REQUEST} ^[A-Z]+\s(.*/)/+\sHTTP/[0-9.]+$
RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]

# Remove multiple slashes anywhere in URL
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]
This works for the double slashes. But I still don't know what to do about / on the end of a URL.

Is the answer to try and code it so it works ONLY with a / on the end and if Google caches pages without the slash, then manual 301s or is there a htaccess rule for each of our rules to spot them?

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 9:58 am
by simonmlewis
We also have an issue where if at the end of a product url, you enter /hello, the page still loads. This is the product htaccess:

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 10:33 am
by Celauran
simonmlewis wrote:We also have an issue where if at the end of a product url, you enter /hello, the page still loads. This is the product htaccess:

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Toss a $ on the end of the rule, causing the extra /hello to not match the rule and fall through to other rules.

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 10:38 am
by simonmlewis
Where exactly do you mean?
If I do it like this:

Code: Select all

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Or like this:

Code: Select all

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)/$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
And put in a word after the final /, it still loads the page.

If I put it like this right at the end:

Code: Select all

RewriteRule ^categ/([^/]+)/([0-9]+) /index.php?page=categ&cname=$1&pagenum=$2 [QSA] $
It still all loads.

Any ideas?

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 10:47 am
by Celauran
simonmlewis wrote:Where exactly do you mean?
If I do it like this:

Code: Select all

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Or like this:

Code: Select all

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)/$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
And put in a word after the final /, it still loads the page.
Must be another rule catching it, then.

.htaccess
[text]RewriteEngine on

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [QSA,L][/text]

index.php

Code: Select all

<?php var_dump($_GET);
URI: /product/foo/bar/123/baz
[text]array (size=5)
'page' => string 'product' (length=7)
'cname' => string 'foo' (length=3)
'sname' => string 'bar' (length=3)
'product' => string '123' (length=3)
'h' => string 'baz' (length=3)[/text]

URI: /product/foo/bar/123/baz/hello
[text]array (size=0)
empty
[/text]

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 10:48 am
by simonmlewis
This is our current live HTACCESS
There is a lot at the top to do with the double slashes.

Look in the NEW URL section.

Code: Select all

DirectoryIndex index.php index.html index.htm
order allow,deny
allow from all 
Options +FollowSymLinks
Options +Indexes
RewriteEngine On


# Remove multiple slashes after domain
RewriteCond %{HTTP_HOST} !=""
RewriteCond %{THE_REQUEST} ^[A-Z]+\s//+(.*)\sHTTP/[0-9.]+$ [OR]
RewriteCond %{THE_REQUEST} ^[A-Z]+\s(.*/)/+\sHTTP/[0-9.]+$
RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]

# Remove multiple slashes anywhere in URL
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]


RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]


RewriteRule ^(blog)($|/) - [QSA]


# old url rewrite
RewriteRule ^categ/([0-9]+)/([^/]+) /index.php?page=categ&c=$1&cname=$2 [QSA]
RewriteRule ^categ/page/([0-9]+)/([^/]+)/([0-9]+) /index.php?page=categ&c=$1&cname=$2&pagenum=$3 [QSA]
RewriteRule ^subcateg/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=subcateg&c=$1&cname=$2&s=$3&sname=$4&menu=sub [QSA]
RewriteRule ^subcateg/page/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+) /index.php?page=subcateg&c=$1&cname=$2&s=$3&sname=$4&pagenum=$5 [QSA]

RewriteRule ^product/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&c=$1&cname=$2&s=$3&sname=$4&product=$5&h=$6 [QSA]
# end of old url rewrite


# NEW URLS
RewriteRule ^categ/([^/]+)/([0-9]+) /index.php?page=categ&cname=$1&pagenum=$2 [QSA] 
RewriteRule ^categ/([^/]+) /index.php?page=categ&cname=$1 [QSA]

RewriteRule ^subcateg/([^/]+)/([^/]+)/([0-9]+) /index.php?page=subcateg&cname=$1&sname=$2&pagenum=$3 [QSA]
RewriteRule ^subcateg/([^/]+)/([^/]+) /index.php?page=subcateg&cname=$1&sname=$2 [QSA]

RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
# END OF NEW URLS



RewriteRule ^knowledge/([0-9]+) /index.php?page=knowledge&id=$1 [QSA]
RewriteRule ^knowledge/answer/([0-9]+)/([0-9]+) /index.php?page=knowledge&id=$1&id_link=$2 [QSA]
RewriteRule ^pricedrop/page/([0-9]+)/ /index.php?page=pricedrop&pagenum=$1 [QSA]
RewriteRule ^productsnew/page/([0-9]+)/ /index.php?page=productsnew&pagenum=$1 [QSA]


RewriteRule ^productzoom/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=productzoom&c=$1&cname=$2&s=$3&sname=$4&product=$5&h=$6 [QSA]
RewriteRule ^loadout/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=loadout&c=$1&cname=$2&s=$3&sname=$4&product=$5&h=$6 [QSA]
RewriteRule ^pricematch/([0-9]+) /index.php?page=pricematch&id=$1 [QSA]

RewriteRule ^type/([^/]+) /index.php?page=type&type=$1 [QSA]
RewriteRule ^use/([^/]+) /index.php?page=use&use=$1 [QSA]

RewriteRule ^back-in-stock/page/([0-9]+)/([^/]+) /index.php?page=back-in-stock&pagenum=$1&power=$2 [L,QSA]

RewriteRule ^productsall/([0-9]+)/ /index.php?page=productsall/ [QSA]


RewriteRule ^productsall/page/([0-9]+)/ /index.php?page=productsall&pagenum=$1/ [QSA]
RewriteRule ^manufacturers/([^/]+) /index.php?page=manufacturers&manufacturer=$1 [QSA]
RewriteRule ^accessories-manufacturers/([^/]+) /index.php?page=accessories-manufacturers&manufacturer=$1 [QSA]
RewriteRule ^product-tags/page/([0-9]+)/([^/]+) /index.php?page=product-tags&pagenum=$1&producttag=$2 [QSA]
RewriteRule ^product-tags/([^/]+) /index.php?page=product-tags&producttag=$1 [QSA]
RewriteRule ^products-wrapped/page/([0-9]+)/ /index.php?page=products-wrapped&pagenum=$1/ [QSA]
RewriteRule ^videos/([0-9]+) /index.php?page=videos&catid=$1 [QSA]
RewriteRule ^videos/product/([0-9]+)/([0-9]+) /index.php?page=videos&catid=$1&id=$2 [QSA]
RewriteRule ^videos/product-search/([0-9]+)/([0-9]+)/([^/]+)/([^/]+) /index.php?page=videos&catid=$1&id=$2&search=$3&searchvideo=$4 [QSA]

RewriteRule ^([^/\.]+)/?$ index.php?page=$1 [QSA]

RewriteRule ^$ index.php?page=home [QSA]

RewriteRule ^robots.txt$ robots.php [QSA]

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 10:57 am
by Celauran
Adding /$ to the product rewrite rule results in a 404 if I add /hello to the URI. Shouldn't it?

Re: Is there a HTACCESS string that stops duplicate pages?

Posted: Mon Feb 01, 2016 10:59 am
by Celauran
Also, this is a really long and convoluted .htaccess. You should really consider implementing some routing in PHP.