PHP DIE for Multiple Personality Useragent

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

PHP DIE for Multiple Personality Useragent

Post by JAB Creations »

This useragent mocks my stats script...
NuSearch Spider (compatible; MSIE 6.0)
So what are you? You look like MSIE with a toolbar called "NuSearch" to my stats script!

I basically only want to block useragents (manually) that declare themselves as multiple useragents.

So I need a little help getting this to work...

Code: Select all

$useragent = $_SERVER['HTTP_USER_AGENT'];
if ($useragent=="NuSearch" && $useragent=="MSIE"){header("HTTP/1.0 403");die();}
Once I get that to work their UA will be blocked until they fix their agent. I am 100% sure I want to do this...the scipt just isn't working for me like the way I setup it up however.

John
Last edited by JAB Creations on Tue Apr 04, 2006 6:36 am, edited 2 times in total.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

For the record, any windows application that use an IE control to interface to the web will probably report it's useragent as "Application Name (compatible; MSIE 6.0)". It's not 2 user agents, it's one with some extra information about what it's using to render HTML. Applications based on Gecko or Mozilla will do the same sort of thing.
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Post by JAB Creations »

I am 100% sure I want to do this.
Please don't post if you're not going to help. :|
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

onion2k wasn't helpful?
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Re: PHP DIE for Multiple Personality Useragent

Post by Roja »

JAB Creations wrote:I basically only want to block useragents (manually) that declare themselves as multiple useragents.
That is (almost) all useragents.

Here's a list of *stock* browsers, with no plugins or searchbars:

IE6: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
IE7: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)
AOL: Mozilla/4.0 (compatible; MSIE 4.01; AOL 4.0; Windows 98)
NS6.1: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.2) Gecko/20010726 Netscape6/6.1
Avant: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Avant Browser [avantbrowser.com]; iOpus-I-M; QXW03416; .NET CLR 1.1.4322)
Konq: Mozilla/5.0 (compatible; Konqueror/3.1-rc3; i686 Linux; 20020515)
Opera (as IE): Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50
Safari: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/412 (KHTML, like Gecko) Safari/412

The *only* browsers that does not do so by default, as far as I know:

ELinks: ELinks (0.4pre5; Linux 2.4.27 i686; 80x25)
Opera: Opera/8.51 (Windows NT 5.1; U; en)

So, I understand that you are sure you want to block people with search bars installed, but the method you have chosen (blocking browsers with multiple user-agents) is going to actually block ALL browsers, save two.

Are you sure THAT is what you 100% want to do?
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Post by JAB Creations »

Dam it, this is why I do not post here often: people do not effectively what I post!
I basically only want to block useragents (manually)
Manually...and did my original guess of code include the and operator? It did...

MSIE 7.0b = spammers already using new useragent spoofs.

I know about the Mozilla double useragent issue. Thats why I said manually setting my restrictions. I was not talking about Mozilla.

Yes I am 100% sure I want to do what I was talking about. Now could someone please read my post without having to GUESS what I wrote? :evil: I think after going through 100mb access logs almost line by line for two years I have some clue as what I'm talking about. :roll:
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

JAB Creations wrote:MSIE 7.0b = spammers already using new useragent spoofs.
The only applications you'll block by restricting User Agents are genuinue ones. Spammers, hackers, crackers, me .. we all spoof an Agent string using one from a real browser .. my preference is "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)". You wouldn't be able to tell the difference between my script and a user browsing from IE.

Anyway.. the answer to your question is trivial..

Code: Select all

$useragent = $_SERVER['HTTP_USER_AGENT'];
if (strpos($useragent,"NuSearch")!==false && strpos($useragent,"MSIE")!==false){
    header("HTTP/1.0 403");
    exit; //Use exit to end a script if you're not reporting an error.
}
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Post by Roja »

JAB Creations wrote:Dam it, this is why I do not post here often: people do not effectively what I post!
Do not effectively what? We can effectively understand you, only if you write clearly. (You left out a word between effectively and what).

Further, many times people focus too much on a particular method to solve a problem, without knowing why that method can be a bad thing. We try to help people on these forums, so we explain the situation - again, to ensure everyone effectively understands the situation.
JAB Creations wrote:
I basically only want to block useragents (manually)
What does that mean? code = automatic, not manual.
JAB Creations wrote:Manually...and did my original guess of code include the and operator? It did...
Your code isn't manual - its automatic. It runs without manual intervention - without you pressing the button. That makes it automatic.

Further, the "and" operator in your example doesn't change the facts. All browsers (save two) have multiple user agents.
JAB Creations wrote:MSIE 7.0b = spammers already using new useragent spoofs.
Huh? IE7b is out, and is in use. Or do you mean that spammers already spoof as that useragent? (I'm sure they do, but what does that have to do with anything?)
JAB Creations wrote:I know about the Mozilla double useragent issue. Thats why I said manually setting my restrictions. I was not talking about Mozilla.
Our point was the way you stated the problem - "Browsers with multiple useragents" - is *ALL* browsers (save two). You were imprecise in your phrasing, and we were trying to help clarify it, so we could come up with a solution that does what you want, without blocking EVERY browser (save two).
JAB Creations wrote:Now could someone please read my post without having to GUESS what I wrote?
Only if you clearly write what you are trying to say. So far, you haven't, so we've had to ask for clarification - which is not the same as guessing.
JAB Creations wrote::evil: I think after going through 100mb access logs almost line by line for two years I have some clue as what I'm talking about. :roll:
You have a clue about what, you don't seem to have mastered HOW to talk about it.
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Post by JAB Creations »

Thank you onion2k...

I pointed out MSIE 7.0b as to make you guys aware that the real IE7 Beta does not declare itself as a Beta in the useragent string for your benefit. It's an easy way to find associated IPs to block in example.

While I am not naturally gifted at programming I believe you guys have made false relativisms. For example the whole argument in the question if I really wanted to do what I wanted to do seems to be associated with filtering all useragents which would be automatic. My filter concept is built to work manually.

My reasoning: Useragents should declare their name, version if (is a browser), if (applicable) {build number}, and (if a bot) a valid url. If it isn't easy to track down or it spoils my statistics script's browser shares in example because I have to make filters for every retarded useragent then I simply want it blocked and written off. We each have our goals as site/server admins.

Code: Select all

elseif (strpos($useragent,"234234234")!==false && strpos($useragent,"676767")!==false){header("HTTP/1.0 403");
elseif (strpos($useragent,"345345345")!==false && strpos($useragent,"787878")!==false){header("HTTP/1.0 403");
elseif (strpos($useragent,"456456456")!==false && strpos($useragent,"898989")!==false){header("HTTP/1.0 403");
I used numbers simply to avoid getting in to another browser dispute. With the above example only three specific instances of combined UA's will be 403ed. This list is manually set as in while they will automatically work for me in the sense of how machines automate tasks they are manual in the sense that yes bad agents can spoof as anything they desire but I am talking about useragents built by programmers that do not satisfy my desire to know what they are immediately without being coy.
You wouldn't be able to tell the difference between my script and a user browsing from IE.
There are plenty of ways to detect the nature of application and even what type of browser it is regardless of it's useragent.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

why not use get_browser() or a pure php version of it? At least then, you can categorize them into their own groups or disregard them, but I wouldn't deny them the request as that can ignite bad attitudes from search engines (serving different content to a search engine is often grounds for delisting)
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

JAB Creations wrote:
You wouldn't be able to tell the difference between my script and a user browsing from IE.
There are plenty of ways to detect the nature of application and even what type of browser it is regardless of it's useragent.
True. And a half decent hacker can spoof them all.
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Post by JAB Creations »

Without a useragent standard I have to manually track things; it's the only reliable way to get the job done correctly.
True. And a half decent hacker can spoof them all.
I didn't say you couldn't spoof, surely you can. All I need to know is if it's a legit bot, legit browser, or an abuser. With abusers I don't need to worry what the application is, I just need (to)(and do) know how to block them without blocking legit browsers. Even legit browsers that spoof can be detected as their true identity. I'm not trying to say I can out smart you; just that a spec of dust doesn't need to move much on my server for me to take notice. :wink:
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Post by Roja »

JAB Creations wrote:All I need to know is if it's a legit bot, legit browser, or an abuser. With abusers I don't need to worry what the application is, I just need (to)(and do) know how to block them without blocking legit browsers.
This is exactly the point I was asking about for clarification. Your definiton of what "non-legit" is, and how you can block them without blocking legit browsers.

Your signature is more than inflammatory - its a cheap shot attempting to make a one-sided argument. I'm taking the bait. You didn't (and don't) know what you want - you've said so above. You want to know how to block one group of browsers, and not another, and we've asked, stated, and inferred that you can't reliably do so.

Once again - until and unless you have a definition of "legit" v. "non-legit", we can't help you differentiate between the two. Worse, despite your certainty that you can - I can provably show that you cannot. Virtually every browser uses multiple useragents. Most non-legitimate spiders spoof as legitimate useragents.

Please - if you know of a method that allows you to identify the difference between the two, educate us. If we knew that, we could write the code to do it. So far, your answers have amounted to "Multiple UA's in the UA", which we've shown isn't an effective differentiator.
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Post by JAB Creations »

Legit - Declares itself as only itself and not something else.

We all know the problem (most won't admit it as a problem) of the Mozilla compatible string in almost every UA.

The goal was to block only specific UAs with certain combinations and for me to add it manually. Perhaps if I asked how to execute it as an array it might have clarified my position better. Manual in part because my stats script will not catch things as explicit though it does alert me vaguely in certain ways.

If you read the original useragent string I specified the exact spider name as the other string. If MSIE was detected as true but not the spider then nothing happens to any application with MSIE in it's UA string. So it would only effect UAs I target.

Now the concern would be valid if I was trying to block UAs because I did not like Mozilla in IE's UA string. But I'm not doing that.

I define legit bots by those with their name (and not another bot/browser's name) and a valid and working URL. While the original UA does not have a URL in the UA it was easy to find online so it's legit.
Please - if you know of a method that allows you to identify the difference between the two, educate us.
For clarification are you asking how to tell the difference between a legit bot or abuser or what an application is even if it's spoofing? I haven't explained either in the thread.

Non-legit bots will have a UA that makes no sense, no hits on robots.txt, constant and direct hits on hot files (guestbooks and contact pages in example), and will read off of a map a sister program gives it urls to crawl or have referrals either not programmed as supported or turned off in order to hide how it found the file. However Apache is capable of detecting direct requests versus referrals even if the referral is disabled on any application.

Detecting the correct UA when the UA is spoofed is a synch. What is the biggest headache coders faced especially in the old day? Add to the fact that IE 5+ supports conditional comments, if you stop and think of how a site with serverside scripting would work along with themes you would also include external stylesheets...via IECC. You can't disable access to the IECCSS in IE so MSIE spoofers are the absolute EASIEST to catch. Most people code for crap (clientside that is) but a well coded page will use IECCSS. Add to the fact that a spammer wants to get as many email addys ASAP they aren't going to request tons of files so hits on pages without related files are automatically highly suspect. If the spammer tries to get around that then to say protect hot files one could program for proprietary issues when the spoofer is detected requesting a hot file to cause trouble for the spammer spoofing and attempting to make it look like a normal request (look as in when you look at your access log). I won't go too deep in to it but if you understand proprietary issues between browsers you'll know how to detect them for what they are no matter how good they try to spoof.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

JAB Creations wrote:Add to the fact that a spammer wants to get as many email addys ASAP they aren't going to request tons of files so hits on pages without related files are automatically highly suspect.
Slightly off topic here. When I was at Uni (6 years ago now) I wrote a Perl bot that scanned websites for images and downloaded them. It was designed for leeching porn sites :). My script would multiplex between 50 different websites at a time, and add a small random wait (between 0 and 1.5 seconds) to each request operation. Every hit was accompanied with the proper HTTP referrer, proper UA string, everything. During testing I would often run it on my website and then compare the logs from normal user activity .. there was literally no difference. It was impossible to statistically tell which activity was the leech script and which was users .. simply because the time between each request was 50*(0~1.5) seconds. My script was continuously downloading stuff, but each site only saw the activity from it once every 50 cycles.

That was me 6 years ago. Lord only knows what spammers do these days. But if you think a spike in your logs indicates a spammer harvesting your site you're entirely wrong.
Post Reply