Page 2 of 2

Posted: Wed Jun 20, 2007 10:38 pm
by Luke
I've read that in many places. The page I posted more or less said that (except without explicitly stating that they parse css)

Posted: Thu Jun 21, 2007 5:26 am
by superdezign
Whether it's a check Google does automatically, has set intervals in which they actually take the time to check, or does out of suspicion is something I'd like to know. But regardless, Google wouldn't TELL us, they'd just scared us and have us assume that they always check. I'm sure always checking is a bit much for the spider though.

Posted: Thu Jun 21, 2007 5:46 am
by AKA Panama Jack
Google is by far the most persistent web crawler. They hit our game web site over a hundred times a DAY. I think half our bandwidth usage is from Google bots. I have seen up to 10 different Google Bots hit the site at the same time. Yes, 10 different IP addresses that trace back to Google.

It can get annoying at times. ;)

A distant second is the MSNBot. It doesn't hit as often but it pulls the same image files over and over where Google apparently checks modification dates and doesn't download unchanged images.

Posted: Thu Jun 21, 2007 5:54 am
by superdezign
AKA Panama Jack wrote:Google is by far the most persistent web crawler. They hit our game web site over a hundred times a DAY. I think half our bandwidth usage is from Google bots. I have seen up to 10 different Google Bots hit the site at the same time. Yes, 10 different IP addresses that trace back to Google.
Google accepts requests to limit the frequency of crawling. All you have to do is verify that the site is, indeed, yours and they'll do what you tell them to.

Posted: Thu Jun 21, 2007 6:29 am
by patrikG
AKA Panama Jack wrote:Google is by far the most persistent web crawler. They hit our game web site over a hundred times a DAY. I think half our bandwidth usage is from Google bots. I have seen up to 10 different Google Bots hit the site at the same time. Yes, 10 different IP addresses that trace back to Google.

It can get annoying at times. ;)

A distant second is the MSNBot. It doesn't hit as often but it pulls the same image files over and over where Google apparently checks modification dates and doesn't download unchanged images.
If you go to Google Sitemaps you can adjust the voraciousness of the GoogleBot.

Posted: Thu Jun 21, 2007 11:19 am
by Luke
yup

Posted: Thu Jun 21, 2007 11:41 am
by AKA Panama Jack
patrikG wrote:
AKA Panama Jack wrote:Google is by far the most persistent web crawler. They hit our game web site over a hundred times a DAY. I think half our bandwidth usage is from Google bots. I have seen up to 10 different Google Bots hit the site at the same time. Yes, 10 different IP addresses that trace back to Google.

It can get annoying at times. ;)

A distant second is the MSNBot. It doesn't hit as often but it pulls the same image files over and over where Google apparently checks modification dates and doesn't download unchanged images.
If you go to Google Sitemaps you can adjust the voraciousness of the GoogleBot.
I tried that many months ago. Signed up, built everything and didn't make a difference. :)

Posted: Thu Jun 21, 2007 7:00 pm
by Ambush Commander
Googlebot does the same for my site. It's been so bad that when I generate web-sites, I purposely exclude GoogleBot from the stats.

On the plus side, most of my referrals come from Google. And I purposely told them to step the indexing. :-P