Page 2 of 2

Posted: Wed Apr 12, 2006 12:48 pm
by timvw
Maugrim_The_Reaper wrote: Generally, from filtering spam on my own blog without resorting to desperate measures like CAPTCHAs (unless post is a certain age) a few filters watching URL counts (how many URLs per comment), author and body terms, etc. works well.
Some spambots simply post like "Hey i like your site..." And then simply (ab)use the author-url.. Which might make sense since i don't expect search-engine crawlers to see a difference between an url in my "post content" and an url in the "author div".
Maugrim_The_Reaper wrote: So to does having some form of mechanism for forcing a delay between individual comments - spambot generally try posting dozens of comments per second if not more.
Some are smart enough to wait a while.. But they do come back, day after day (well, untill i redirect them to http://{$_SERVER['REMOTE_ADDR']} ;))
Maugrim_The_Reaper wrote: Relying on IPs is not going to be very reliable - a spammer can switch proxies as often as you ban IPs. Many will never even post from the same IP to the same site if they can help it.
In my experience they do re-use IPs from the same netblock. And too bad for open proxies, they're unwelcome ;)

at last detected crawler

Posted: Thu Apr 13, 2006 12:17 am
by deeppak
hi all at last i have picked this spammer the hostname of the spammer is as follows:
sv-crawlfw4.looksmart.com

now any one can now plz tell me how to stop him from crawling my site shall i disallow him in robots.txt and tell me some other method also how to stop him from spamming.

be quick plz
i am already messed up fighting this dreaded deamon the fight is still on and it wil continue till i stop this stupid crawler from spamming my site i thank all of u in contributing to this, i really appreciate all of you guaidance

Thanx in advance

Re: at last detected crawler

Posted: Thu Apr 13, 2006 1:51 am
by AKA Panama Jack
deeppak wrote:hi all at last i have picked this spammer the hostname of the spammer is as follows:
sv-crawlfw4.looksmart.com

now any one can now plz tell me how to stop him from crawling my site shall i disallow him in robots.txt and tell me some other method also how to stop him from spamming.

be quick plz
i am already messed up fighting this dreaded deamon the fight is still on and it wil continue till i stop this stupid crawler from spamming my site i thank all of u in contributing to this, i really appreciate all of you guaidance

Thanx in advance
Actually most robot spammers ignore the robot.txt file.

hey then what is the solution

Posted: Thu Apr 13, 2006 2:01 am
by deeppak
comon is there not even a single way to stop him after getting his url even.


thanx in advance

Posted: Thu Apr 13, 2006 2:43 am
by Maugrim_The_Reaper
Create a list of banned hosts, check for them on every comment post, refuse to post comment...
Some spambots simply post like "Hey i like your site..." And then simply (ab)use the author-url.. Which might make sense since i don't expect search-engine crawlers to see a difference between an url in my "post content" and an url in the "author div".
So check the author url...it's already done on my own blog.
Some are smart enough to wait a while.. But they do come back, day after day
True, but the key word in both your quotes is "some". The majority don't care about such deliberate measures, they're out to get 1 in every 1000 or more comments actually onto a blog which either isn't fully filtered or misses their comment as being spam.
In my experience they do re-use IPs from the same netblock. And too bad for open proxies, they're unwelcome.
The problem here is the last time I blocked open proxies I got 6 emails inside a day from legitimate users who thought comments has been disabled or were broken. I think of it like blocking IPs - you end up alienating legitimate users. Which is also why I refuse to ever use CAPTCHAs on anything...fullstop - there is at least two people who are blind who read my blog that I know of.

Speaking from personal experience (and not even remotely saying it reflects anyone elses, or even reality for that matter ;)), spammers are unimaginative folk. They re-use the same or similar tactics and messages over and over again. Some inevitably find a way past filters, but usually its a simple matter to adapt filters, or at least get the most suspicious comments listed for review. I get maybe 200-1500 spam attempts during a week - last weekend saw a massive torrent of 850 for example - and only had 3 potential spams make it through. 2 were listed for review, and the 3rd appeared to be nothing but a "I like your blog." linking to Google of all places...

Do Google spam blogs? ;)

at last wanted to implement CAPTCHA

Posted: Tue Apr 18, 2006 11:35 pm
by deeppak
I wanted to implemented CAPTCHA on my site now plz let me know from where to start how to check GD is installed on server or not and what i should look up to start from the very begining. let me know the best solution since i want to look professionalk like yahoo and all implementing CAPTCHA and keep in mind that i am newbie.

Be quick in answering becuase i have already waisted so much time fighting this spammer.

Cheers,
Deeppak Gupta

Posted: Wed Apr 19, 2006 2:58 am
by Maugrim_The_Reaper
Check PEAR - I believe it has a CAPTCHA class.

Re: at last wanted to implement CAPTCHA

Posted: Fri Jun 02, 2006 5:35 am
by aerodromoi
deeppak wrote:I wanted to implemented CAPTCHA on my site now plz let me know from where to start how to check GD is installed on server or not and what i should look up to start from the very begining. let me know the best solution since i want to look professionalk like yahoo and all implementing CAPTCHA and keep in mind that i am newbie.

Be quick in answering becuase i have already waisted so much time fighting this spammer.

Cheers,
Deeppak Gupta
In case you're talking about those infamous link lists, why don't you use

Code: Select all

substr_count(strtolower($string), strtolower("http://"));
to decide whether an entry needs review or not.

Joe Blog won't post more than two or three links in his entry, so anything above that should be spam.
It's not 100% foolproof, but combined with a time-out of two or three minutes and a blacklist of undesired words it helps
to keep spam at bay.

aerodromoi

ps: Hope you could settle your spam problem.