Small, short code snippets that other people may find useful. Do you have a good regex that you would like to share? Share it! Even better, the code can be commented on, and improved.
<?php
$hostname = gethostbyaddr($_SERVERї'REMOTE_ADDR']);
$chck = stristr($hostname, '.');
$ccode = ".inktomisearch.com";
if($chck == $ccode) {
//spoof 404
echo "
<html><head>
<title>404 - Error</title>
</head>
<style>
body { font-family: verdana, arial, sans-serif; font-size: 12pt; color: #333;
background-color: #fff; margin: 0pt; padding: 0pt; }
</style>
<body>
<div align='center'>
<p><image src='/logo.gif'
width='200' height='118'
border='0'
alt='logo'></p>
<p> </p>
<table width='580'>
<tr><td>
<h3>404 not found</h3>
<p>The requested resource could not be found.</p>
</td></tr></table>
</div>
</body>
</html>";
} else {
//let them in
//code here
}
?>
This is an example of how to prevent crawlers such as inktomisearch from crawling your pages, specifically ones you use to track downloads, reviews, etc.. banning may be better, but perhaps this coupled with a ban, may be the double layer you need to kill these nasty beasts.
Meant to be a working example of banning by way of spoof. You could also use this snippet to search hostnames for country codes as well. Let's say that you wish to keep people from lets say .au (australia) from viewing your web-pages, you could search for .au and they would be shown the 404; however, if they bounce off proxies not in australia then of course this and banning will not work, but you know we try.
BTW: would this be considered a honeypot or atleast a form of one?
regards,
- fresh
Last edited by fresh on Sat Jan 15, 2005 10:25 pm, edited 1 time in total.
Sure it works, but I don't see why you want to stop crawlers. Yes there maybe certain sections of your site that you dont want to be crawled, but you can specify them in meta tags or the robots.txt file.
The code isn't just for bots/crawlers though and you might have got more of a response if you generalized the script a bit. As you say if you change it to .au then it will stop all Austrialians from visiting the site. You could take this further and ban user-agents and such - which would be another good way to ban bots.
hey, thats a good idea, but I read about inktomisearch and it said it utilises robot.txt to do it's searching, so this one is a bit nasty.. plus I read others saying how they would like to stop it from crawling, because it floods them with crawls, so I figured it would be useful to someone.. that was a good idea you came up with about the user-agents but those can be changed easy enough..
This mission is quite a pain in the arse if you ask me.
your right, if they are behind a firewall or NAT it will probably show something different.. I'll have to play with it a bit, if I come up with something concrete, would you guys care if I posted my source here.. It will most likely be in Java though..
quick question.. First of all, I figure this is what I will do:
I will write a JAVA socket to listen on port 80 and once someone or bot requests the page and comes to it via browser or telnet I will snatch their IP that way. Otherwise, I could use JSP to strip it from the headers, I already wrote a script to do that along with the SID, Hostname, etc.. but as you know that is a completely useless way of tracking clients, so we use JAVA which runs client side and can create sockets.
Q: If I try to listen on port 80 will that cause a conflict with the HTTP server which is also listening on port 80?
If so, what could be an alternative approach? Maybe I could couple the routine with PHP which may send them to lets say port 1337 and then back to port 80 before they even know what happened and I could throw in a routine that bans predefined IP's as well?
btw: if anyone would like to see my JSP script just ask; also, since no one has objected I will make sure to publish my JAVA source here with a link to DL the class file to use however you want.
I think the conflict will be there if u try to run 2 processes to listen on the same port.
Say, if u can code in Java then maybe u can write a prog to listen on port 80 and check if the incoming request is valid then redirect it to the port server lisitens on or something like that.
I figured from my experience with coding in C++ and VB that listening on the same port is foul, so I just assumed that it would cause conflicts. I have already written the server in JAVA, I will make it listen on port 1337 I think and I will couple it with PHP and have the script send them to that port like:
then I will run the JAVA against the client, retrieve the IP and send them back to port 80, before they even know what happened, it will look like a meta redirect at worst.
I still havent finsished coding everything on the server, right now it accepts connections, retreives remote input and echos it back to the client, which I was using for testing purposes, I plan to remove that before the release. Maybe by the end of the week I will have something complete.
Hey! I hadn't advised ya 2 write a full-fledged server or sumthing . I only said maybe write aloop or so which
wud accept all connections at 80 and if its a valid one then redirect it to real port (1337 according 2 ya).
U dont have to put ur whole week in that simple thing.
well validation will come after I have retrieved the true IP even if the client is bouncing of proxies, which is the point of this project.
I assume since the HTTP server is already listening on port 80, then I would need to listen on a different port such as 1337.
However flawed, my therory is this:
1. Listen on port 1337
2. User connects to port 80
3. PHP sends them to port 1337
4. Server accepts connection
5. Server queries the machine for IP
6. Server retrieves the IP and logs it
7. Server sends them back to port 80
8. done
I haven't even begun testing this on any HTTP server yet, so far I have only gotten it to run on my PC via command line: java file.class
And it works as expected. I still need to write the query chunk and the redirection chunk and it will be complete. The code is quite small and I would have gotten this done sooner except I have never written anything in JAVA before so it took some time to learn the language.
The time spent is fine because, I can always recycle the code and use it to make chat applets or something and it was fun to learn so, I would say that the time spent was and will continue to be well worth it to me.
Although I do want to ask anyone who may know from experience if my therory is indeed flawed and if so, what can I take as an alternative action in order to achieve the same results.