Page 1 of 3
Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 8:46 am
by simonmlewis
We have URLs that load the category pages whether there is a / on the end or not.
We are told this is bad for SEO, as it is seen as a duplicate page.
This is our HTACCESS for the categories.
RewriteRule ^categ/([^/]+) /index.php?page=categ&cname=$1 [QSA]
Is there something simple I can add to this to only allow URLs with a / on the end?
If I add a /, then the page loads but doesn't load up the information from the URL.
Just wondering if the rewriterule should be different to stop those with a / being the same as those without.
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 8:53 am
by Celauran
simonmlewis wrote:We have URLs that load the category pages whether there is a / on the end or not.
We are told this is bad for SEO, as it is seen as a duplicate page.
Flip side of that is that /category/t-shirts and /category/t-shirts/ going to different pages is bad for users. You could redirect one to the other, couldn't you?
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 9:08 am
by simonmlewis
Both pages are the same in the content and layout. It's just SEO wise.
But if someone posts a link to /category/t-shirts/, and it loads the same page as the one without the slash, Google will cache both pages... which is bad as it's a duplicate.
So I don't know if there is a global overriding way in our HTACCESS of controlling it.
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 9:12 am
by Celauran
Celauran wrote:You could redirect one to the other, couldn't you?
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 9:16 am
by simonmlewis
http://stackoverflow.com/questions/1708 ... y-htaccess
The other issue we have is with double slashes.
IF the url has // after uk, and then just single / through the rest, it loads and doesn't remove the bad //s.
But if I put it in with // at the start after uk, and // thru the rest of the URL it does rewrite it.
Code: Select all
#remove double/more slashes in url
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{REQUEST_URI} ^(.*?)(/{2,})(.*)$
RewriteRule . %1/%3 [R=301,L]
So it's not quite right.
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 9:23 am
by Celauran
. matches any character, including /. You probably want to modify that first rule to exclude /
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 9:32 am
by simonmlewis
Code: Select all
RewriteEngine On
RewriteCond %{THE_REQUEST} \s/+(.*?)/+(/\S*) [NC]
RewriteRule ^ %1%2 [R=302,L,NE]
RewriteCond %{REQUEST_URI} ^(.*)/{2,}(.*)$
RewriteRule . %1/%2 [R=301,L]
This is where I am up to. The first one works and replaces multiple //// after .uk, but doesn't work and throws a 403 error if you go to an internal page with //.
If I use only the second one, then the internal pages rewrite, but the first one doesn;t.
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 9:50 am
by simonmlewis
Code: Select all
# Remove multiple slashes after domain
RewriteCond %{HTTP_HOST} !=""
RewriteCond %{THE_REQUEST} ^[A-Z]+\s//+(.*)\sHTTP/[0-9.]+$ [OR]
RewriteCond %{THE_REQUEST} ^[A-Z]+\s(.*/)/+\sHTTP/[0-9.]+$
RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]
# Remove multiple slashes anywhere in URL
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]
This works for the double slashes. But I still don't know what to do about / on the end of a URL.
Is the answer to try and code it so it works ONLY with a / on the end and if Google caches pages without the slash, then manual 301s or is there a htaccess rule for each of our rules to spot them?
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 9:58 am
by simonmlewis
We also have an issue where if at the end of a product url, you enter /hello, the page still loads. This is the product htaccess:
RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 10:33 am
by Celauran
simonmlewis wrote:We also have an issue where if at the end of a product url, you enter /hello, the page still loads. This is the product htaccess:
RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Toss a $ on the end of the rule, causing the extra /hello to not match the rule and fall through to other rules.
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 10:38 am
by simonmlewis
Where exactly do you mean?
If I do it like this:
Code: Select all
RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Or like this:
Code: Select all
RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)/$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
And put in a word after the final /, it still loads the page.
If I put it like this right at the end:
Code: Select all
RewriteRule ^categ/([^/]+)/([0-9]+) /index.php?page=categ&cname=$1&pagenum=$2 [QSA] $
It still all loads.
Any ideas?
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 10:47 am
by Celauran
simonmlewis wrote:Where exactly do you mean?
If I do it like this:
Code: Select all
RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
Or like this:
Code: Select all
RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)/$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
And put in a word after the final /, it still loads the page.
Must be another rule catching it, then.
.htaccess
[text]RewriteEngine on
RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+)$ /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [QSA,L][/text]
index.php
URI: /product/foo/bar/123/baz
[text]array (size=5)
'page' => string 'product' (length=7)
'cname' => string 'foo' (length=3)
'sname' => string 'bar' (length=3)
'product' => string '123' (length=3)
'h' => string 'baz' (length=3)[/text]
URI: /product/foo/bar/123/baz/hello
[text]array (size=0)
empty
[/text]
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 10:48 am
by simonmlewis
This is our current live HTACCESS
There is a lot at the top to do with the double slashes.
Look in the NEW URL section.
Code: Select all
DirectoryIndex index.php index.html index.htm
order allow,deny
allow from all
Options +FollowSymLinks
Options +Indexes
RewriteEngine On
# Remove multiple slashes after domain
RewriteCond %{HTTP_HOST} !=""
RewriteCond %{THE_REQUEST} ^[A-Z]+\s//+(.*)\sHTTP/[0-9.]+$ [OR]
RewriteCond %{THE_REQUEST} ^[A-Z]+\s(.*/)/+\sHTTP/[0-9.]+$
RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]
# Remove multiple slashes anywhere in URL
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*[^/])$ /$1/ [L,R=301]
RewriteRule ^(blog)($|/) - [QSA]
# old url rewrite
RewriteRule ^categ/([0-9]+)/([^/]+) /index.php?page=categ&c=$1&cname=$2 [QSA]
RewriteRule ^categ/page/([0-9]+)/([^/]+)/([0-9]+) /index.php?page=categ&c=$1&cname=$2&pagenum=$3 [QSA]
RewriteRule ^subcateg/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=subcateg&c=$1&cname=$2&s=$3&sname=$4&menu=sub [QSA]
RewriteRule ^subcateg/page/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+) /index.php?page=subcateg&c=$1&cname=$2&s=$3&sname=$4&pagenum=$5 [QSA]
RewriteRule ^product/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&c=$1&cname=$2&s=$3&sname=$4&product=$5&h=$6 [QSA]
# end of old url rewrite
# NEW URLS
RewriteRule ^categ/([^/]+)/([0-9]+) /index.php?page=categ&cname=$1&pagenum=$2 [QSA]
RewriteRule ^categ/([^/]+) /index.php?page=categ&cname=$1 [QSA]
RewriteRule ^subcateg/([^/]+)/([^/]+)/([0-9]+) /index.php?page=subcateg&cname=$1&sname=$2&pagenum=$3 [QSA]
RewriteRule ^subcateg/([^/]+)/([^/]+) /index.php?page=subcateg&cname=$1&sname=$2 [QSA]
RewriteRule ^product/([^/]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=product&cname=$1&sname=$2&product=$3&h=$4 [L]
# END OF NEW URLS
RewriteRule ^knowledge/([0-9]+) /index.php?page=knowledge&id=$1 [QSA]
RewriteRule ^knowledge/answer/([0-9]+)/([0-9]+) /index.php?page=knowledge&id=$1&id_link=$2 [QSA]
RewriteRule ^pricedrop/page/([0-9]+)/ /index.php?page=pricedrop&pagenum=$1 [QSA]
RewriteRule ^productsnew/page/([0-9]+)/ /index.php?page=productsnew&pagenum=$1 [QSA]
RewriteRule ^productzoom/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=productzoom&c=$1&cname=$2&s=$3&sname=$4&product=$5&h=$6 [QSA]
RewriteRule ^loadout/([0-9]+)/([^/]+)/([0-9]+)/([^/]+)/([0-9]+)/([^/]+) /index.php?page=loadout&c=$1&cname=$2&s=$3&sname=$4&product=$5&h=$6 [QSA]
RewriteRule ^pricematch/([0-9]+) /index.php?page=pricematch&id=$1 [QSA]
RewriteRule ^type/([^/]+) /index.php?page=type&type=$1 [QSA]
RewriteRule ^use/([^/]+) /index.php?page=use&use=$1 [QSA]
RewriteRule ^back-in-stock/page/([0-9]+)/([^/]+) /index.php?page=back-in-stock&pagenum=$1&power=$2 [L,QSA]
RewriteRule ^productsall/([0-9]+)/ /index.php?page=productsall/ [QSA]
RewriteRule ^productsall/page/([0-9]+)/ /index.php?page=productsall&pagenum=$1/ [QSA]
RewriteRule ^manufacturers/([^/]+) /index.php?page=manufacturers&manufacturer=$1 [QSA]
RewriteRule ^accessories-manufacturers/([^/]+) /index.php?page=accessories-manufacturers&manufacturer=$1 [QSA]
RewriteRule ^product-tags/page/([0-9]+)/([^/]+) /index.php?page=product-tags&pagenum=$1&producttag=$2 [QSA]
RewriteRule ^product-tags/([^/]+) /index.php?page=product-tags&producttag=$1 [QSA]
RewriteRule ^products-wrapped/page/([0-9]+)/ /index.php?page=products-wrapped&pagenum=$1/ [QSA]
RewriteRule ^videos/([0-9]+) /index.php?page=videos&catid=$1 [QSA]
RewriteRule ^videos/product/([0-9]+)/([0-9]+) /index.php?page=videos&catid=$1&id=$2 [QSA]
RewriteRule ^videos/product-search/([0-9]+)/([0-9]+)/([^/]+)/([^/]+) /index.php?page=videos&catid=$1&id=$2&search=$3&searchvideo=$4 [QSA]
RewriteRule ^([^/\.]+)/?$ index.php?page=$1 [QSA]
RewriteRule ^$ index.php?page=home [QSA]
RewriteRule ^robots.txt$ robots.php [QSA]
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 10:57 am
by Celauran
Adding /$ to the product rewrite rule results in a 404 if I add /hello to the URI. Shouldn't it?
Re: Is there a HTACCESS string that stops duplicate pages?
Posted: Mon Feb 01, 2016 10:59 am
by Celauran
Also, this is a really long and convoluted .htaccess. You should really consider implementing some routing in PHP.