Is Google going to get mad at me?

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Post by Luke »

I've read that in many places. The page I posted more or less said that (except without explicitly stating that they parse css)
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Whether it's a check Google does automatically, has set intervals in which they actually take the time to check, or does out of suspicion is something I'd like to know. But regardless, Google wouldn't TELL us, they'd just scared us and have us assume that they always check. I'm sure always checking is a bit much for the spider though.
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

Google is by far the most persistent web crawler. They hit our game web site over a hundred times a DAY. I think half our bandwidth usage is from Google bots. I have seen up to 10 different Google Bots hit the site at the same time. Yes, 10 different IP addresses that trace back to Google.

It can get annoying at times. ;)

A distant second is the MSNBot. It doesn't hit as often but it pulls the same image files over and over where Google apparently checks modification dates and doesn't download unchanged images.
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

AKA Panama Jack wrote:Google is by far the most persistent web crawler. They hit our game web site over a hundred times a DAY. I think half our bandwidth usage is from Google bots. I have seen up to 10 different Google Bots hit the site at the same time. Yes, 10 different IP addresses that trace back to Google.
Google accepts requests to limit the frequency of crawling. All you have to do is verify that the site is, indeed, yours and they'll do what you tell them to.
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

AKA Panama Jack wrote:Google is by far the most persistent web crawler. They hit our game web site over a hundred times a DAY. I think half our bandwidth usage is from Google bots. I have seen up to 10 different Google Bots hit the site at the same time. Yes, 10 different IP addresses that trace back to Google.

It can get annoying at times. ;)

A distant second is the MSNBot. It doesn't hit as often but it pulls the same image files over and over where Google apparently checks modification dates and doesn't download unchanged images.
If you go to Google Sitemaps you can adjust the voraciousness of the GoogleBot.
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Post by Luke »

yup
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

patrikG wrote:
AKA Panama Jack wrote:Google is by far the most persistent web crawler. They hit our game web site over a hundred times a DAY. I think half our bandwidth usage is from Google bots. I have seen up to 10 different Google Bots hit the site at the same time. Yes, 10 different IP addresses that trace back to Google.

It can get annoying at times. ;)

A distant second is the MSNBot. It doesn't hit as often but it pulls the same image files over and over where Google apparently checks modification dates and doesn't download unchanged images.
If you go to Google Sitemaps you can adjust the voraciousness of the GoogleBot.
I tried that many months ago. Signed up, built everything and didn't make a difference. :)
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Googlebot does the same for my site. It's been so bad that when I generate web-sites, I purposely exclude GoogleBot from the stats.

On the plus side, most of my referrals come from Google. And I purposely told them to step the indexing. :-P
Post Reply