Page 2 of 3
Re: Spam
Posted: Fri Dec 26, 2008 12:00 am
by Chris Corbyn
This is becoming a real pain I have to admit. We've noticed the recent influx ourselves. I've just had to delete 3 threads and ban a user who was spamming movie links. Definitely bots since they post way too fast to be humans.
There are a number of things we can do to help cut this down but they all take time. I guess we should look at that akismet MOD and see how much forum code we'd have to change (this is the biggest reason we don't MOD heavily... it holds us back from doing upgrades). I hate phpBB so much and would drop it if we could all agree on an alternative that doesn't suck.
EDIT | Oh yes, and do click the report button if you see spam. It makes the thread show up with a big red exclamation mark next to it when we view the forum.
Re: Spam
Posted: Fri Dec 26, 2008 12:45 am
by volomike
Chris - FluxBB
Re: Spam
Posted: Fri Dec 26, 2008 9:38 am
by Chris Corbyn
I just browsed the source of that and it still looks like a complete mess to me

SQL and HTML all interspersed, massive files named things like "functions" that are full of undocumented code. Looks like the version that supports plugins is still in beta... I wonder how flexible the plugins are though (i.e. do you still have to modify the source of the BB for some things that the plugins can't handle?).
It's tempting to write a nice MVC one in ZF, but I bet somebody is already onto it.
To be honest, we've looked at quite a few and nobody can really agree on what we should be using... some people just like phpBB because it's familiar. Extremely open to suggestions, so thanks

Re: Spam
Posted: Fri Dec 26, 2008 3:31 pm
by alex.barylski
If I were to choose I would go with Vanilla Forum.
1. The source is relatively simple (compared others punBB has SQL/HTML mixed too).
2. The author seems to be aiming for an object solution
3. Appears to have clean HTML/PHP separation
There are only about a dozen entry points in the base directory and each acts as a controller of sorts, sets up the environment, performs basic authorization checks, etc.
Cheers,
Alex
Re: Spam
Posted: Sun Dec 28, 2008 9:05 pm
by josh
volomike wrote:Just take the registration's password field tag, and change it up slightly.
They're way smarter then that, they can just match keywords near the form input, or use common sense since the input type will be "password". What about verifying new user signups or implementing a captcha for each unverified user's first 50 posts or something like that
Re: Spam
Posted: Mon Dec 29, 2008 1:19 pm
by califdon
jshpro2 wrote:volomike wrote:Just take the registration's password field tag, and change it up slightly.
They're way smarter then that, they can just match keywords near the form input, or use common sense since the input type will be "password". What about verifying new user signups or implementing a captcha for each unverified user's first 50 posts or something like that
Good thoughts. I've noticed a LOT of recent posts that follow a pattern of new user signup, then 3 fast posts, then nothing more from that username, suggesting that they are programmatic.
Re: Spam
Posted: Mon Dec 29, 2008 4:48 pm
by josh
Definitely, I've noticed "they" are also to hit my contact forms, even if I obfuscate the HTML for the form. Very sneaky they are

Re: Spam
Posted: Mon Dec 29, 2008 8:03 pm
by volomike
I hate captchas. I just feel I'm much more clever in defeating bots than to put legitimate end users through yet another hoop.
Here's a super tough solution:
1. Generate a random number of about 20 digits.
2. Store that number in a session var (non-persistent).
3. Take that same number and run an md5() on it.
4. Drop that number from step #3 into a session cookie and give it a longish hexadecimal name that is consistent.
5. Now we switch to the post receipt form (where we pick up the $_POST requests). On the post receipt form where someone has posted a user/pass, or posted a forum message, we first check whether this cookie exists. If it doesn't, we show the user the 500 Door with header(). This is better than 404 because it means you can track it in your error logs a lot better.
6. If the cookie exists, we then take our session var from step#2. We run an md5() on it and compare that to our cookie value. If they don't match, we show the user the 500 Door.
7. We check to ensure that the cookie value isn't empty and that they haven't figured out a way to mess with the header to force us to have a null session var in memory. If they have, we show the user the 500 Door.
8. Delete the session var to free up memory.
9. Delete the session cookie.
10. Okay, at this point, we let the processing continue as it normally did.
11. We scan the Apache error log for error ID 500 events. When we see a lot of these, we can track IP and then block IP if it gets out of hand.
Re: Spam
Posted: Mon Dec 29, 2008 8:14 pm
by alex.barylski
From a user perspective I hate CAPTCHA too...
The problem I have is with the noise in some CAPTCHA's is so distracting it's difficult to tell what the letters are, lowercase, uppercase, swirled, swashed and swooped...
How many times I've had to try and try again to sign up to a forum because the CAPTCHA was so skewed.
Really, the idea is to stop bots...relatively simple bots that just look for a form and POST. I dought many (if any) of the spamming bots are sohpisticated enough to perform any kind of OCR on CAPTCHA images.
Just another example of over-engineered solution(s).
The best CAPTCHA I have ever seen was one that used pictures from Hot Or Not and based on averages of users, you had to select 1 hottie out of a group of 6 Uglies. Ironically I guessed right everytime and I tried at least a hundred times. Thankfully my picture isn't on Hot or Not, I'd hate to myself on the ugly side of a vote. Haha.
Cheers,
Alex
Re: Spam
Posted: Mon Dec 29, 2008 9:03 pm
by josh
volomike wrote:I hate captchas. I just feel I'm much more clever in defeating bots than to put legitimate end users through yet another hoop.
Here's a super tough solution:
1. Generate a random number of about 20 digits.
2. Store that number in a session var (non-persistent).
3. Take that same number and run an md5() on it.
4. Drop that number from step #3 into a session cookie and give it a longish hexadecimal name that is consistent.
5. Now we switch to the post receipt form (where we pick up the $_POST requests). On the post receipt form where someone has posted a user/pass, or posted a forum message, we first check whether this cookie exists. If it doesn't, we show the user the 500 Door with header(). This is better than 404 because it means you can track it in your error logs a lot better.
6. If the cookie exists, we then take our session var from step#2. We run an md5() on it and compare that to our cookie value. If they don't match, we show the user the 500 Door.
7. We check to ensure that the cookie value isn't empty and that they haven't figured out a way to mess with the header to force us to have a null session var in memory. If they have, we show the user the 500 Door.
8. Delete the session var to free up memory.
9. Delete the session cookie.
10. Okay, at this point, we let the processing continue as it normally did.
11. We scan the Apache error log for error ID 500 events. When we see a lot of these, we can track IP and then block IP if it gets out of hand.
Uh sounds like a lot of unnecessary work for what can be defeated in 5 minutes by a programmer with Curl. Basically to bypass that all I'd have to do is interpret the header that sets the cookie, and then send the cookie value with my next request. I dont see the problem with captcha, especially if we put it on just the signup, just to start...
Edit: nvm I just checked and we do already. I guess the only solution is more mods or integrating a spam detection service, earlier in this thread someone posted a phpBB plugin that did just that, no?
Re: Spam
Posted: Mon Dec 29, 2008 9:39 pm
by volomike
jshpro2 - actually, that won't work. You can use curl all day on a technique like that and you won't be getting access to the original, unique, 20-digit number in the session var that is stored in shared memory for each new session. All you will have is the cookie that the md5 was applied upon, and you can't reverse engineer that back into the 20 digit number except by running down all 20 digit numbers until you find a match with your cookie. And even then, if you do find that, it's changed with every session, and it's going to be generating 500 errors when you fail. Given enough 500 errors, any sysop reviewing logs would catch you and block your IP.
This technique is like a 3 liner in the form, and a 3 liner in the receiving page.
Re: Spam
Posted: Mon Dec 29, 2008 10:11 pm
by josh
You are setting a cookie, and md5ing it on the server side, setting a new cookie with each request?
With a valid user:
request 1, server sends cookie
request 2, server compares md5 value of cookie with value on server
With a scripted browser
request 1, server sends cookie
request 2, scripted browser sends cookie with next request, just like the browser did. Server compares md5 value of cookie with value on server
To put it in other words, your technique simply filters out the spam bots that don't support cookies, it does not have any ability to tell whether the browser is being driven by a human or a script. Keep in mind given the context of the posts and the sophisticated techniques the spammers are using to disguise the spam as actual content, its probably a human posting the spam anyways. A bayes or nueral network based spam detection suite will catch "features" in the text, instead of relying on stuff that can be bypassed with little effort by the spammer.
PS even if you meant you were md5ing on the client side w/ javascript, a potential spammer wouldn't be hard pressed to realize thats what its doing, and simply md5 the value before its passed with the next http request. All you're essentially doing is potentially blocking out legitimate bot traffic like googlebot, and cookie-less users
Re: Spam
Posted: Mon Dec 29, 2008 10:15 pm
by alex.barylski
Mike: I'm missing something. Why does this work for a browser but not an emulated user agent, like cURL?
I don't understand the solution 100% but it sounds like you are trying to set a cookie on the client side and using that cookie detection on the server (which stores a ID generated on the server) and using that to validate human-ability of the user agent???
Re: Spam
Posted: Mon Dec 29, 2008 10:23 pm
by Syntac
I really hope you guys don't have to integrate Akismet. phpBB's source is an enormous mess.
Re: Spam
Posted: Mon Dec 29, 2008 10:24 pm
by josh
It's not that bad... Nothing xdebug and grep won't overcome.