Prevent Nasty Crawlers from Crawling

Small, short code snippets that other people may find useful. Do you have a good regex that you would like to share? Share it! Even better, the code can be commented on, and improved.

Moderator: General Moderators

User avatar
fresh
Forum Contributor
Posts: 259
Joined: Mon Jun 14, 2004 10:39 am
Location: Amerika

Post by fresh »

well I have come to realise that this project seems to be a waiste of time after all, because of security restrictions with applets and applications, I would need to exploit the client in order to retreive the IP from the machine, or that's atleast what I was told.

I understand I could use PERL which could be my server side way around the restrictions but to be honest after two days of dealing with absurd jvm troubles, compilation issues and lack of sleep, I am thoroughly sick of this project. :)

Perhaps, I could make a MOD instead which could scan each users message for links and certain words in them like, sex, porn, xxx, even url's you are aware of, if any offending words or urls are found, the script could *** those words out and kill the link; thus, the spammers link would be dead and the Moderator could be alerted to it's existance via auto PM and would then have the oppurtunity to delete the post and ban the account. I sumise that by killing the links in the posts is the most certain way to defeat spam. For example, lets say the spammer works for: http://www.sex-porn.com, with the MOD in place the post would result in: http://www.***-****.com. By doing that you eliminate the key element of the spam, the linkage! As spammers become more broad, you keep track of these urls and add them to the list, which would treat the url as if it were the word porn or sex, it too would be *** out.

You guys think this would work or no?
User avatar
n00b Saibot
DevNet Resident
Posts: 1452
Joined: Fri Dec 24, 2004 2:59 am
Location: Lucknow, UP, India
Contact:

Post by n00b Saibot »

Bright Idea, Man ;-)
Perl definitely has some advantages if ya like to use it in ur project.
like the server coding u wanted to do would be only 'bout 10-15 lines and pretty fast too
than Java :D

whatdya say
User avatar
fresh
Forum Contributor
Posts: 259
Joined: Mon Jun 14, 2004 10:39 am
Location: Amerika

Post by fresh »

I actually have already looked into PERL sockets and they return the IP threw the %ENV REMOTE_ADDR var.. which is taken from the headers. With PERL I was thinking of using it to mediate between the client and server, but I think if the client is bouncing off proxies or behind a router, you would need to exploit the client in order to get the true IP.. but if you can come up with something I'd be interested in checking it out.. :)

As far as the MOD I was talking about above, here it is:

Code: Select all

<?php
//check for and bleep curse words
$phrase  = $message;
strtolower($phrase);
$curse = array("bad word 1", "bad word 2", "bad url 1", "bad url 2");
$bleeped  = array("*** **** *", "*** **** *", "*** *** *", "*** *** *");
$message = str_replace($curse, $bleeped, $phrase);
?>
Just place this chunk in 'posting.php' and assign the appropriate keywords, upload the edited file to your server and it should work. Right now I have no way of testing this script, so if anyone actually uses it, let me know how it works. BTW: This script should be better than the original filter used by phpbb for curse words. One major difference is this script looks for every occurence of the string. In phpbb if you filter the word <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span> but the user types <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span> then the word <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span> is not filtered; however, using this script instead will filter all occurences and kill links with predefined keywords in them such as sex, porn, etc... enjoy :)

regards
User avatar
n00b Saibot
DevNet Resident
Posts: 1452
Joined: Fri Dec 24, 2004 2:59 am
Location: Lucknow, UP, India
Contact:

Post by n00b Saibot »

That one is really effective one as far as i think but i havent tried it out . have ya ?
User avatar
fresh
Forum Contributor
Posts: 259
Joined: Mon Jun 14, 2004 10:39 am
Location: Amerika

Post by fresh »

Yeah, I actually have used a variation of the script I posted above on my own web page which deals with reviews of our software. It works quite well and is highly customizable too.

BTW: I got the JAVA server done, it returns the true IP even if the user is on proxy or behind a firewall (and that's probably because it is running on my local machine), I haven't tested it on a NAT or remotely just yet, so I have no idea what it will do then. I have also yet to find an easy and universal way to run this on a HTTP server, I am sure you could if you actually owned the server, but if you have no authority to install and run executables then my application would not be the answer for you, even though I am sure I am overlooking an easy solution to do this. Anyway, if anyone is interested in seeing my source, just ask and I will be glad to post it for you here.

regards
Post Reply