Hey Guys!
I've been working on some sites in PHP but am still fairly new.
I was wondering if it's possible to setup some type of IF statement if the visitor is a crawler?
For example if I had a counter on my side that increments an entry into the database for every visitor, I dont mind if someone gets to the site from a search engine, but wouldn't want a search engine crawling my site to increment the counter.
Any ideas?
This is something I'm doing from scratch(trying anyways).
Possible to setup If when search engine crawler?
Moderator: General Moderators
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Possible to setup If when search engine crawler?
There's no sure way, but the best way would be to check the user agent ($_SERVER['HTTP_USER_AGENT']). Search it for "safari", "mozilla", "firefox", "chrome", "opera", "ie", etc. with stripos().
http://php.net/stripos
http://php.net/stripos
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
Re: Possible to setup If when search engine crawler?
If you do a search for "PHP robot detection", you'll find quite a few libraries designed for this.
What they basically do is check the user agent in the request against a list of known robot user agents, so it's not 100% (unknown/new robots for instance), but should suffice for your purposes.
What they basically do is check the user agent in the request against a list of known robot user agents, so it's not 100% (unknown/new robots for instance), but should suffice for your purposes.
Re: Possible to setup If when search engine crawler?
Ahh, just about to read through those links, but the coding mentioned do you think it would be as simple as the following? Or I guess like you guys are saying maybe I'll have to make it specific to each BOT out there.
if (!$_SERVER['HTTP_USER_AGENT']){
//do whatever I need because your not a crawler
}
else{
//do nothing because you ARE a crawler
}
if (!$_SERVER['HTTP_USER_AGENT']){
//do whatever I need because your not a crawler
}
else{
//do nothing because you ARE a crawler
}
- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Possible to setup If when search engine crawler?
No, more like this:
But I like John's idea better: use a pre-fab solution.
Code: Select all
$agents = array('mozilla', 'safari', 'ie', 'firefox', 'opera', 'chrome');
$isHuman = false;
foreach ($agents as $agent) {
if (stripos($_SERVER['HTTP_USER_AGENT'], $agent) !== false) {
$isHuman = true;
break;
}
}
if ($isHuman) {
// do whatever you need because it's not a crawler
} else {
// do nothing because it is a crawler
}Re: Possible to setup If when search engine crawler?
Ahh ok I just found something similar to that too
I think I understand it better. Going to try the following. I'm assuming the code in my last doesn't work because every user or crawler will have a http_user_agent but if it's an actual user using a web browser it will be Internet Explorer or Firefox and if it's a crawler it will be like googlebot or something like that. Found the following online and going to try it out. Hopefully it works out for me. Thanks so much!
Should be able to just put if (!is_robot()){
//you're a user so do whatever
}
else{
//you are not!
}
Should be able to just put if (!is_robot()){
//you're a user so do whatever
}
else{
//you are not!
}
Code: Select all
function is_robot(){
$robots = array(
"Accoona-AI-Agent",
"AOLspider",
"BlackBerry",
"bot@bot.bot",
"CazoodleBot",
"CFNetwork",
"ConveraCrawler",
"Cynthia",
"Dillo",
"discoveryengine.com",
"DoCoMo",
"ee://aol/http",
"exactseek.com",
"fast.no",
"FAST MetaWeb",
"FavOrg",
"FS-Web",
"Gigabot",
"GOFORITBOT",
"gonzo",
"Googlebot-Image",
"holmes",
"HTC_P4350",
"HTML2JPG Blackbox",
"http://www.uni-koblenz.de/~flocke/robot-info.txt",
"iArchitect",
"ia_archiver",
"ICCrawler",
"ichiro",
"IEAutoDiscovery",
"ilial",
"IRLbot",
"Keywen",
"kkliihoihn nlkio",
"larbin",
"libcurl-agent",
"libwww-perl",
"Mediapartners-Google",
"Metasearch Crawler",
"Microsoft URL Control",
"MJ12bot",
"T-H-U-N-D-E-R-S-T-O-N-E",
"voodoo-it",
"www.aramamotorusearchengine.com",
"archive.org_bot",
"Teoma",
"Ask Jeeves",
"AvantGo",
"Exabot-Images",
"Exabot",
"Google Keyword Tool",
"Googlebot",
"heritrix",
"www.livedir.net",
"iCab",
"Interseek",
"jobs.de",
"MJ12bot",
"pmoz.info",
"SnapPreviewBot",
"Slurp",
"Danger hiptop",
"MQBOT",
"msnbot-media",
"msnbot",
"MSRBOT",
"NetObjects Fusion",
"nicebot",
"nrsbot",
"Ocelli",
"Pagebull",
"PEAR HTTP_Request class",
"Pluggd/Nutch",
"psbot",
"Python-urllib",
"Regiochannel",
"SearchEngine",
"Seekbot",
"segelsuche.de",
"Semager",
"ShopWiki",
"Snappy",
"Speedy Spider",
"sproose",
"TurnitinBot",
"Twiceler",
"VB Project",
"VisBot",
"voyager",
"VWBOT",
"Wells Search",
"West Wind",
"Wget",
"WWW-Mechanize",
"www.show-tec.net",
"xxyyzz",
"yacybot",
"Yahoo-MMCrawler",
"yetibot",
);
foreach($robots as $robot){
if(stristr($_SERVER["HTTP_USER_AGENT"],$robot)){
$from_spider=true;
break;
}
}
if($from_spider==true){
return true;
}
else
{
return false;
}
}- Jonah Bron
- DevNet Master
- Posts: 2764
- Joined: Thu Mar 15, 2007 6:28 pm
- Location: Redding, California
Re: Possible to setup If when search engine crawler?
If I were you, I'd go the other way. It's a lot easier to keep track of browser user agents than crawlers. And also there's crawlers that don't provide a user agent. You don't even need to put in obscure browsers because it's just a counter.