PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Mon Sep 28, 2020 4:38 pm

All times are UTC - 5 hours




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Fri Sep 30, 2016 8:24 am 
Offline
DevNet Master

Joined: Wed Oct 08, 2008 3:39 pm
Posts: 4434
Location: United Kingdom
In Google Webmasters you can "fetch as Google" to see if a URL is fine, or if it is being correctly blocked, for example, blocking those with & in then.
Why would the 404 Not Found list, have a bunch of URLs that have & in them?

Is it a fact that Google shows you what URLs are being requested, and when, and if they are going to 404 - even if they are being blocked by their crawlers?

So you can see a) what is being blocked and b) what consumers are being taken to.

Or - is there something fundamentally wrong if the crawlers are being blocked for a & URL but still appearing in the 404s list?

_________________
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.


Top
 Profile  
 
PostPosted: Fri Sep 30, 2016 12:28 pm 
Offline
Spammer :|
User avatar

Joined: Wed Oct 15, 2008 2:35 am
Posts: 6617
Location: WA, USA
What are you talking about with this "&" thing? You mean ampersands in a query string? There's nothing wrong with URLs like that.


Top
 Profile  
 
PostPosted: Fri Sep 30, 2016 12:38 pm 
Offline
DevNet Master

Joined: Wed Oct 08, 2008 3:39 pm
Posts: 4434
Location: United Kingdom
We block them for our own reasons. Partly to avoid duplicate urls.
But that isn't the point of the question.

_________________
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.


Top
 Profile  
 
PostPosted: Fri Sep 30, 2016 10:47 pm 
Offline
Spammer :|
User avatar

Joined: Wed Oct 15, 2008 2:35 am
Posts: 6617
Location: WA, USA
You have a history of asking odd questions.

For example, you just said that you're blocking them, but one of your original questions was why the 404 list would have them. The answer should be obvious: because you're blocking them. That's such an obvious answer I have to wonder if I am correctly understanding your question in the first place.

Then you ask about whether Google is showing you requested URLs. Saying "if they are going to 404" sounds like you think Google can predict what will happen. They can't. They'll crawl your site and log what happens, and you can see parts of that log.

My conclusion is that you don't know what your blocks are doing. Ask the people who set up those blocks how they are working.


Top
 Profile  
 
PostPosted: Sat Oct 01, 2016 3:39 am 
Offline
DevNet Master

Joined: Wed Oct 08, 2008 3:39 pm
Posts: 4434
Location: United Kingdom

_________________
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.


Top
 Profile  
 
PostPosted: Sat Oct 01, 2016 5:14 am 
Offline
Spammer :|
User avatar

Joined: Wed Oct 15, 2008 2:35 am
Posts: 6617
Location: WA, USA


Top
 Profile  
 
PostPosted: Sun Oct 02, 2016 2:45 pm 
Offline
DevNet Master

Joined: Wed Oct 08, 2008 3:39 pm
Posts: 4434
Location: United Kingdom
Pardon me for being blunt, but I'm not sure why this is such a difficult thing to grasp.

I have URLs that I won't want to be cache or seen by Google. We told that if the URL, for example as & in it, then adding that to Robots will stop it crawling those URLs.

Perfect. Exactly what we want.

Yet, it still crawls those pages, as they are showing us as 404s as those URLs are now dead. At some point, the Robots file was ignored (or damaged), and so rather a lot of pages were opened up to Google, and now there is a massive 404 list.

We figured that by having the robots in place correctly, it would not cache them anymore. And yet suddenly on another side, it's cached a TON of them, even with the robot file correctly in place. Those URLs if I visit them correctly go to 404s, but why are they showing up in Webmasters if we are telling Google not to cache them.

Hence my first question, is the 404 list a generic set of 404 pages that Google has found, but NOT CACHED because of the robots file?

_________________
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.


Top
 Profile  
 
PostPosted: Sun Oct 02, 2016 3:05 pm 
Offline
Spammer :|
User avatar

Joined: Wed Oct 15, 2008 2:35 am
Posts: 6617
Location: WA, USA


Top
 Profile  
 
PostPosted: Sun Oct 02, 2016 3:17 pm 
Offline
DevNet Master

Joined: Wed Oct 08, 2008 3:39 pm
Posts: 4434
Location: United Kingdom
Ok for one site the Robots was messed up so that explains why a ton of them came back. On a side note I don't know why, when Robots is now setup again, we cannot clear the thousands of 404s to confirm they are resolved. As now that the site is blocking 'googlebot' from seeing them, why does the list only go down by 1000 a day??

The main reason I write here is because one of our other sites has had DOUBLE the amount of the previous site, and it's robots file is correct and has always been so. We block /*?, so that a url that starts /index.php?page=selector.... (it's a long UrL) doesn't get crawled.

Yet all of a suddenly this site has 10s of thousands of these things appears as 404s, and yet we tell Robots not to crawl them.

_________________
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.


Top
 Profile  
 
PostPosted: Sun Oct 02, 2016 4:29 pm 
Offline
Spammer :|
User avatar

Joined: Wed Oct 15, 2008 2:35 am
Posts: 6617
Location: WA, USA


Top
 Profile  
 
PostPosted: Sun Oct 02, 2016 4:51 pm 
Offline
DevNet Master

Joined: Wed Oct 08, 2008 3:39 pm
Posts: 4434
Location: United Kingdom
So do they not allow you to just "clear" all thousands of 404s, even tho I know full well those URLs are not blocked again? So they only allow up to 1000 clearances a day?

The ones on the other site are 100% urls that have not been there for a good 3-4 years. And for some reason, in the past week these 10s of thousands have jut appeared.

My question is, if someone is trying to cause us problems, and has generated 10s of thousands of these URLs, the robots file should block googlebot from crawling it, thus they should not appear in the 404 list at all?!

_________________
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.


Top
 Profile  
 
PostPosted: Sun Oct 02, 2016 9:39 pm 
Offline
Spammer :|
User avatar

Joined: Wed Oct 15, 2008 2:35 am
Posts: 6617
Location: WA, USA
I don't know what I can tell you that I haven't already said a few times.


Top
 Profile  
 
PostPosted: Mon Oct 03, 2016 2:56 am 
Offline
DevNet Master

Joined: Wed Oct 08, 2008 3:39 pm
Posts: 4434
Location: United Kingdom
It's a problem then.
If it's blocked via robots, it shouldn't therefore appear in the 404s.

_________________
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group