Detecting user's proxy

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Detecting user's proxy

Post by josh »

On one of my web sites we have a group of users who just seem to dedicate their lives to posting needless spam, we have set up scripts to lock accounts and detect spam, and ban by IP but it doesn't seem to stop them. They use proxy lists and continue to clutter the web site. Is there any way I can detect if a user is using a proxy, or any simple way I can stop this spam automatically instead of having to go through posts every night and ban people?

I have seen other web sites where once you are banned proxies won't work, how is it done. The only possible way is some kind of client side code, but what would be easiest? Wouldn't this still be able to be bypassed?

Thanks for any advice.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

I think generally if the $_SERVER['HTTP_X_FORWARDED_FOR'] header is sent then they have come via a proxy. In which case $_SERVER['REMOTE_ADDR'] is the address of the proxy :wink:
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Post by josh »

I just tested that out and it works for about half the proxies I try, is there any other ways in addition to this one that I can use?
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Re: Detecting user's proxy

Post by Roja »

jshpro2 wrote:On one of my web sites we have a group of users who just seem to dedicate their lives to posting needless spam, we have set up scripts to lock accounts and detect spam, and ban by IP but it doesn't seem to stop them. They use proxy lists and continue to clutter the web site.
Require authentication of each account with an email address. It will at least slow them down, and possibly force them to stop long enough to develop a script that works with your script. Best, it forces them to use up an email address (which you can report to the provider) for each spam, if you can catch them fast enough.

Also look at keyword filtering. Usually spams are fairly consistent - using numerous keywords in multiple postings (<span style='color:red;text-decoration:blink' title='Alert a moderator!'>grilled spam</span>, "make money fast!") that arent likely to be used by normal humans. Have a script that detects those keywords in a post, and marks the post for moderation - its not visible until you set it so.

That way, you lose no content from real humans (you can moderate it), but the spammers get no gain until you give it to them. (Which may be enough discouragement to leave)

Finally, I virtually guarantee they are making a link to their website, so have your script rewrite any comments with href's to use the rel="NoFollow" attribute. That will instantly eliminate any of their pagerank advantage for spamming your board.

Do all of that, and they'll have few reasons to return - all without worrying about IP's and proxy's, which are completely unreliable.
jshpro2 wrote: Is there any way I can detect if a user is using a proxy
Lets ask that a different way: "Is there any reliable way I can detect if a user is using a proxy"

The answer is absolutely NO.

I've answered it (with more detail) multiple times:
viewtopic.php?p=174852#174852
viewtopic.php?p=173755#173755
jshpro2 wrote:or any simple way I can stop this spam automatically instead of having to go through posts every night and ban people?
See my above suggestions. Moderation and link power killing are the two most powerful choices.
jshpro2 wrote:I have seen other web sites where once you are banned proxies won't work, how is it done.
No, you haven't. You've seen sites that set a cookie for the local browser. Or ban the ip of the browser. Or of the proxy if they can detect it.

There is no reliable method - the user can get around all of those.
jshpro2 wrote: The only possible way is some kind of client side code, but what would be easiest? Wouldn't this still be able to be bypassed?
Yes.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Post by josh »

Thanks for clearing that all up, I will definently do the email validation, in addition, I will set keywords. If completely necesary I will require posts to be moderated before they are visible to the general public.

Thank you for the post very helpfull.
User avatar
m3mn0n
PHP Evangelist
Posts: 3548
Joined: Tue Aug 13, 2002 3:35 pm
Location: Calgary, Canada

Post by m3mn0n »

On a related note: here is an article that explains a lot about proxies and identifying users...
Anonymity of Proxy

The exchange of information in Internet is made by the "client - server" model. A client sends a request (what files he needs) and a server sends a reply (required files). For close cooperation (full understanding) between a client and a server the client sends additional information about itself: a version and a name of an operating system, configuration of a browser (including its name and version) etc. This information can be necessary for the server in order to know which web-page should be given (open) to the client. There are different variants of web-pages for different configurations of browsers. However, as long as web-pages do not usually depend on browsers, it makes sense to hide this information from the web-server.

What your browser transmits to a web-server:

* a name and a version of an operating system
* a name and a version of a browser
* configuration of a browser (display resolution, color depth, java / javascript support, ...)
* IP-address of a client
* Other information

The most important part of such information (and absolutely needless for a web-server) is information about IP-address. Using your IP it is possible to know about you the following:

* a country where you are from
* a city
* your provider’s name and e-mail
* your physical address

Information, transmitted by a client to a server is available (accessible) for a server as environment variables. Every information unit is a value of some variable. If any information unit is not transmitted, then corresponding variable will be empty (its value will be undetermined).

These are some environment variables:

REMOTE_ADDR – IP address of a client

HTTP_VIA – if it is not empty, then a proxy is used. Value is an address (or several addresses) of a proxy server, this variable is added by a proxy server itself if you use one.

HTTP_X_FORWARDED_FOR – if it is not empty, then a proxy is used. Value is a real IP address of a client (your IP), this variable is also added by a proxy server if you use one.

HTTP_ACCEPT_LANGUAGE – what language is used in browser (what language a page should be displayed in)

HTTP_USER_AGENT – so called "a user’s agent". For all browsers this is Mozilla. Furthermore, browser’s name and version (e.g. MSIE 5.5) and an operating system (e.g. Windows 98) is also mentioned here.

HTTP_HOST – is a web server’s name

This is a small part of environment variables. In fact there are much more of them (DOCUMENT_ROOT, HTTP_ACCEPT_ENCODING, HTTP_CACHE_CONTROL, HTTP_CONNECTION, SERVER_ADDR, SERVER_SOFTWARE, SERVER_PROTOCOL, ...). Their quantity can depend on settings of both a server and a client.

These are examples of variable values:

REMOTE_ADDR = 194.85.1.1
HTTP_ACCEPT_LANGUAGE = ru
HTTP_USER_AGENT = Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)
HTTP_HOST = http://www.webserver.ru
HTTP_VIA = 194.85.1.1 (Squid/2.4.STABLE7)
HTTP_X_FORWARDED_FOR = 194.115.5.5

Anonymity at work in Internet is determined by what environment variables "hide" from a web-server.

If a proxy server is not used, then environment variables look in the following way:

REMOTE_ADDR = your IP
HTTP_VIA = not determined
HTTP_X_FORWARDED_FOR = not determined

According to how environment variables "hided" by proxy servers, there are several types of proxies

Transparent Proxies:

They do not hide information about your IP address:

REMOTE_ADDR = proxy IP
HTTP_VIA = proxy IP
HTTP_X_FORWARDED_FOR = your IP

The function of such proxy servers is not the improvement of your anonymity in Internet. Their purpose is information cashing, organization of joint access to Internet of several computers, etc.
Anonymous Proxies

All proxy servers, that hide a client’s IP address in any way are called anonymous proxies

Simple Anonymous Proxies:

These proxy servers do not hide a fact that a proxy is used, however they replace your IP with its own:
REMOTE_ADDR = proxy IP
HTTP_VIA = proxy IP
HTTP_X_FORWARDED_FOR = proxy IP

These proxies are the most widespread among other anonymous proxy servers.

Distorting Proxies:

As well as simple anonymous proxy servers these proxies do not hide the fact that a proxy server is used. However a client’s IP address (your IP address) is replaced with another (arbitrary, random) IP:

REMOTE_ADDR = proxy IP
HTTP_VIA = proxy IP
HTTP_X_FORWARDED_FOR = random IP address
High Anonymity Proxies

These proxy servers are also called "high anonymity proxy". In contrast to other types of anonymity proxy servers they hide a fact of using a proxy:

REMOTE_ADDR = proxy IP
HTTP_VIA = not determined
HTTP_X_FORWARDED_FOR = not determined

That means that values of variables are the same as if proxy is not used, with the exception of one very important thing – proxy IP is used instead of your IP address.

Summary:

Depending on purposes there are transparent and anonymity proxies. However, remember, using proxy servers you hide only your IP from a web-server, but other information (about browser configuration) is accessible!
Source: http://www.stayinvisible.com/index.pl/a ... y_of_proxy
User avatar
fresh
Forum Contributor
Posts: 259
Joined: Mon Jun 14, 2004 10:39 am
Location: Amerika

Post by fresh »

I have pondered this very notion for sometime, in my experience I have found that navigating to the IP Address at 80, 8080, and sometimes 3128 usually reveals whether the IP address belongs to a proxy or no. It does this by displaying an error message and at the bottom reveals that the IP is attached to a proxy server.

So, here is my solution. Assuming that if you can connect, it must be a proxy!

Test: http://hackinoutthebox.com/laboratory/noproxy.php

Note: It still needs a loop to check the different possible ports, enjoy!
Last edited by fresh on Tue Aug 02, 2005 3:16 am, edited 2 times in total.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

So all you do is test if their is a process listening at the port. How does it handle all those people with a webserver running on their pc? :p
User avatar
fresh
Forum Contributor
Posts: 259
Joined: Mon Jun 14, 2004 10:39 am
Location: Amerika

Post by fresh »

I considered that, however, I came to the conclusion that anyone navigating the Internet and running a webserver at the same time would be using it for development and, should either have a firewall rejecting the transmission: which would mean (not a proxy); or we will connect and thus reveal quite a foolish person. It is impossible to be 100% correct. :)

For example, in Saudi Arabia thousands of users all use the same proxy server as their gateway to the Internet.
WHOIS results for 212.138.47.15
Generated by http://www.DNSstuff.com

Location: Saudi Arabia

ARIN says that this IP belongs to RIPE; I'm looking it up there.


Using 23 day old cached answer (or, you can get fresh results).
Hiding E-mail address (you can get results with the E-mail address).

% This is the RIPE Whois query server #2.
% The objects are in RPSL format.
%
% Note: the default output of the RIPE Whois server
% is changed. Your tools may need to be adjusted. See
% http://www.ripe.net/db/news/abuse-propo ... 50331.html
% for more details.
%
% Rights restricted by copyright.
% See http://www.ripe.net/db/copyright.html

% Information related to '212.138.47.0 - 212.138.47.255'

inetnum: 212.138.47.0 - 212.138.47.255
netname: ISU-5
descr: Internet Service Unit ISU
country: SA
admin-c: KR6046-RIPE
tech-c: KR6046-RIPE
status: ASSIGNED PA
mnt-by: KACST-ISU-MNT
mnt-routes: KACST-ISU-MNT
mnt-lower: KACST-ISU-MNT
remarks: ------------------------------------------------------
remarks: Part of this IP block has been used for proxy/cache
remarks: service at the National level in Saudi Arabia. All
remarks: Saudi Arabia web traffic will come from this IP block.
remarks:
remarks: If you experience high volume of traffic from
remarks: IP in this block it is because your site is very
remarks: popular/famous of Saudi Arabia community.
remarks:
remarks: For any abuse activities please contact us through
remarks: Email: *****@isu.net.sa
remarks: Phone: +96614813933 (24x7)
remarks: Fax: +96614813221
remarks: ------------------------------------------------------
changed: *****@saudinic.net.sa 19991005
changed: *****@saudinic.net.sa 19991212
changed: *****@saudinic.net.sa 20010707
changed: *****@saudinic.net.sa 20050413
source: RIPE

role: KACST ROLE
address: Saudi Network Information Center, ISU
address: King Abdulaziz City for Science and Technology,
address: P.O.Box 6086, Riyadh 11442, Saudi Arabia.
phone: +9661 481 3932
fax-no: +9661 481 3254
e-mail: *****@saudinic.net.sa
remarks: trouble: *****@isu.net.sa
admin-c: ZOM1-RIPE
tech-c: RA705-RIPE
tech-c: ANAS1-RIPE
nic-hdl: KR6046-RIPE
remarks: This Role object is for handling and maintaining all
remarks: IP Blocks registered by SaudiNIC(LIR) in Saudi Arabia.
mnt-by: KACST-ISU-MNT
changed: *****@saudinic.net.sa 20010701
source: RIPE
abuse-mailbox: *****@isu.net.sa

% Information related to 'KR6046-RIPE'

route: 212.138.47.0/24
descr: Saudi Arabia backbone and local registry address space
descr: WAEL-BT Line
origin: AS8895
notify: *****@isu.net.sa
mnt-by: ISU-NOC
changed: ********@isu.net.sa 20050514
source: RIPE


[The following lines added by http://www.dnsstuff.com per requirement by RIPE]
This service is subject to the terms and conditions stated in the RIPE NCC Database Copyright Notice.
Contact dnsstuff.com's 'info@' address to report problems regarding the functionality of the service.


[If E-mail address(es) were hidden on this page, you can click here to get the results with the E-mail address.


(C) Copyright 2000-2005 R. Scott Perry
So, no matter what you do, even if you can identify the IP as belonging to a proxy server 100% of the time, it is still possible that you will deny legitament users. The point is that we are preventing proxy connections, perhaps even all. ;)
User avatar
m3mn0n
PHP Evangelist
Posts: 3548
Joined: Tue Aug 13, 2002 3:35 pm
Location: Calgary, Canada

Post by m3mn0n »

I agree. Isn't it always the case that some must sacrifice for the greater good of everyone? heh

In my case, I am very strict when it comes to proxy users because what I run is an online game that limits people to 1 account per person. The most popular way to bypass this limitation, and thus benefit yourself greatly in the game, is to use proxy servers and create new aliases.

To combat this, what I like to do is use a script similar to the one you posted and tag potential proxy accounts. This, coupled with automated scripts that detect if an IP accessed more than 1 account can make life so much easier for management.

So my point is, you don't necessarily have to sacrifice dozens and even hundreds of people, you could simply label them as potential proxy users and keep them on a watch list that staff could monitor.

And of course, without a user management system in place that is impossible and sacrifice is the only option. So with one, be sure to try the tag route if you can. :)
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

So you are portscanning my computer? Isn't that illegal? :)
User avatar
bokehman
Forum Regular
Posts: 509
Joined: Wed May 11, 2005 2:33 am
Location: Alicante (Spain)

Post by bokehman »

d11wtq wrote:I think generally if the $_SERVER['HTTP_X_FORWARDED_FOR'] header is sent then they have come via a proxy. In which case $_SERVER['REMOTE_ADDR'] is the address of the proxy :wink:
That is only true with ligit clients whose ISP uses a proxy. In all other cases the proxy is invisible. If you use that method you will get rid of genuine client.
HTTP_VIA – if it is not empty, then a proxy is used. Value is an address (or several addresses) of a proxy server, this variable is added by a proxy server itself if you use one.

HTTP_X_FORWARDED_FOR – if it is not empty, then a proxy is used. Value is a real IP address of a client (your IP), this variable is also added by a proxy server if you use one.
This is also not true. I live in Spain and my Mum lives in London. Some website can only be read from certain countries so I have a proxy on her machine so I can read sites that reject this country. Anyway only the prxy IP shows in the request.
User avatar
bokehman
Forum Regular
Posts: 509
Joined: Wed May 11, 2005 2:33 am
Location: Alicante (Spain)

Post by bokehman »

fresh wrote:I considered that, however, I came to the conclusion that anyone navigating the Internet and running a webserver at the same time would be using it for development and, should either have a firewall rejecting the transmission: which would mean (not a proxy); or we will connect and thus reveal quite a foolish person.
I'm sorry but the foolish person here is you.

I have a webserver as do many other companies. The same IP is also used to surf the net and for all other purposes. I have a number of machines all using the same IP and one is a web server. My firewall is correctly set and port 21, 25, 53, 80, 110 and 443 all accept connections on that IP. All your script will do is stop people who have done nothing wrong from visiting someones site.

Just because someone runs a webserver it does not make them foolish. To me the foolish person is the one who pays for webhosting when the site could be run on a local machine for free.
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Post by Roja »

Oh boy, yet more confusion.

I have to disagree with some of the information from Stayinvisible. Other than these three items, its fairly solid, so I have to think it was simply a matter of oversimplifying to sell their product.

For our discussion, its definitely inaccurate, to wit:

REMOTE_ADDR - Possibly the IP address of a client. It can also be forged, spoofed, redirected, or the address of a concentrator (webproxy) being used from a business. In short, unreliable.

HTTP_VIA - Rarely used by virtually any proxy software. Admin configurable, completely unreliable.

HTTP_X_FORWARDED_FOR - Often set by proxy software. However, a large number of prominant proxy softwares (Webaccelerator, squid, etc) recommend NOT setting it for security purposes, and in fact default to not doing so. Anonymizers often set it to a farm of addresses, diluting what little value it might have had. Fairly unreliable.

I can browse at my former employers site, and my Remote_addr will be a site in California (remember, I'm in Ohio), VIA wont be set, and X_FORWARDED_FOR wont be either. I've tested it dozens of times, and its *extremely* common in large companies. If nothing else, thats ~50,000 users that violate the theory that any of the above are in any way "reliable".

My signature says it all.
User avatar
fresh
Forum Contributor
Posts: 259
Joined: Mon Jun 14, 2004 10:39 am
Location: Amerika

Post by fresh »

bokehman wrote:
fresh wrote:I considered that, however, I came to the conclusion that anyone navigating the Internet and running a webserver at the same time would be using it for development and, should either have a firewall rejecting the transmission: which would mean (not a proxy); or we will connect and thus reveal quite a foolish person.
I'm sorry but the foolish person here is you.
haha.. zinger! Well, I wasn't directing that to anyone I am sorry your complex has convienced you otherwise.
I have a webserver as do many other companies. The same IP is also used to surf the net and for all other purposes. I have a number of machines all using the same IP and one is a web server. My firewall is correctly set and port 21, 25, 53, 80, 110 and 443 all accept connections on that IP. All your script will do is stop people who have done nothing wrong from visiting someones site.
If they have a choice to come direct and chose to hide behind a proxy, that is supicious enough to want to prevent them from coming to your site. Perhaps, if you want them, I will direct them all to you.
Just because someone runs a webserver it does not make them foolish. To me the foolish person is the one who pays for webhosting when the site could be run on a local machine for free.
I highly doubt your homemade, backwoods "webserver" could ever compete with my host Ipowerweb. Alright buddy? So, you keep using that machine to navigate the web and someone will hit you with a RPC sploit and you can kiss your beefed up atari good-bye!

@timvw:

It isn't illegal to port scan in the way that we are:
Now let us examine the legality of port scanning.

Under the Indian Information Technology Act, 2000, the act of port scanning does not amount to hacking, (5)

which is defined as:

"Whoever with the intent to cause or knowing that he is likely to cause wrongful loss or damage to the public or any person destroys or deletes or alters any information residing in a computer resource or diminishes its value or utility or affects it injuriously by any means, commits hacking."

The essential elements of hacking are

1. Intention or Knowledge
2. Wrongful Loss to Public or Person
3. Deletion / Alteration / Destruction or
4. Diminishes Value or Utility

of information residing in a computer resource

Port Scanning will satisfy the first requirement of Knowledge or Intention.

But the second essential is not met, as port scanning does not necessarily cause any wrongful loss. E.g. if a network administrator, scans his own network for security reasons, then he will not intend to create any wrongful loss.

Also, all the other elements of hacking are also not invoked as port scanning merely scans the crust of the network without affecting any information resource residing within it.

Thus Port Scanning definitely does not attract the offence of 'hacking', unless it is used by a cracker, with the intention to crack the system, and in conjunction with any other tool that actually changes any information that resides in the computer.

Under the US Computer Fraud and Abuse Act, as well as under cyber laws of other countries, the element of "unauthorized access" is generally found to sufficiently cover the act of port scanning. Specifically 18 USC Sec. 1030(a)(5)(B) of the American Act has been applied to the act of port scanning in a previous case.

This subsection essentially has six elements that the prosecution must prove.

1. The defendant intentionally accessed a protected computer,
2. The defendant did not have authorization to access the computer
3. As a result of the access, the defendant recklessly caused damage
4. The damage impaired the integrity or availability of data, a program, a system, or information
5. That caused a loss aggregating at least $5000 or
6. Threatened public health or safety
Resource: http://www.asianlaws.org/cyberlaw/libra ... anning.htm

thanks!
Post Reply