HTACCESS 301 - how do you point, when current URL has // ?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: HTACCESS 301 - how do you point, when current URL has //

Post by simonmlewis »

I think I have.
We have URLs that have somehow been posted to the system (ie. in forums or even bad old links) that have blanks between the //.

The expected behaviour is that my HTACCESS line of code takes that said URL thru to another page.
On some of them, it DOES seem to work, and goes to /selectpage, which is a custom page we run that is more useful to the customer. However Google thing it isn't, and says "Googlebot couldn't crawl this URL because it points to a non-existent page. Generally, 404s don't harm your site's performance in search, but you can use them to help improve the user experience".

For privacy reasons I've been asked not to provide all HTACCESS details here. Sorry.

I don't understand why some the // are going to my error page, but Google sees that as NOT correct.
I've jsut checked a few at random, and they all go to my custom page. So rather than some 404 black and white page, it goes to an internal page. But Google doesn't see it like that, and I don't know why.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: HTACCESS 301 - how do you point, when current URL has //

Post by Celauran »

It's going to be very challenging for anyone to help you when we've got incomplete information. Could be a malformed rule, could be a conflict with another rule, could be Google having stored a previous, erroneous redirect (since you're using 301).
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: HTACCESS 301 - how do you point, when current URL has //

Post by simonmlewis »

Ok. Yes I understand that.
We have over 1,500 301s in our HTACCESS, because Google had cached a ton of old URLs.
Do these have to stay in our file, or can they be cleared out?
What I am thinking now is this (with my SEO head on): if I remove them all, and start over, will that mean Google uses those old ones all over again, or has it (and other search engines) sorted them?

What I don't want, is to remove them all, and suddenly our webmaster errors go through the roof.

But it might just be better now to clear the lot, see what happens, and then go thru the 404s a fresh when they appear on Google.

Sorry to make this so challenging, and I fully accept I am not making this easy.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: HTACCESS 301 - how do you point, when current URL has //

Post by simonmlewis »

Let me try and explain a little better, as there may be an improved solution for this.

Imagine you have a product for sale on Amazon. It's sold or for whatever reason, you decide to remove it.
It will therefore no longer be in Amazon's sitemap.
Or, if it is a product on another web site, and that is removed or the category is removed (or category ID is changed), that URL is now dead.

So you have something in your script that says "if catid is not found, take user to this page ... "sorry this page has moved."".

This is what we do with our web sites. But we still get Google's Webmaster Soft 404 errors, when pages that have changed long ago, are somehow cached or still in forums posted elsewhere, and Google then tells us "this is a Soft 404 because it's not on your site now.

So what I do (rightly or wrongly) is go into HTACCESS, and point the now defunct page to a nice error message page, or to the nearest similar page via a 301,L. So it's permanently pointed there.

Two issues I can see with that. Months or years on, if that page we are pointing them to (a "nearest page") is also now changed, it will cause further issues.

What is the best solution?
  • ignore Soft 404s? (surely that's a bad thing)
    Keep an eye on the Soft 404 and update HTACCESS... but with what?
    Something else.... but what?
The // issue is one I would like to resolve with a topend HTACCESS script, like this:

Code: Select all

RewriteRule ^(.*)//(.*) /selectpage [L,R=301,L]
So anything that has a // in it, is taken elsewhere. So I don't need 100s of the URLs all in there.
I'm sure there is a good solution for it, but I am nowhere near being a HTACCESS expect here.

Code: Select all

ErrorDocument 404 /404.php
Maybe if I created a 404.php file, made it as near as possibly similar to our error page, this would stop ALL the 404 errors?


Thanks.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
Post Reply