Using HTACCESS to block variable IP bots...

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

Post Reply
Forum Contributor
Posts: 159
Joined: Fri Dec 26, 2008 9:43 pm

Using HTACCESS to block variable IP bots...

Post by Wolf_22 »

I'm trying to cut down on spammers who keep making trashy requests to my site using different IPs per-each request. The basic access log entry pattern that I'm seeing from these requests are as follows:
<IP ADDRESS> - - [<DATE / TIMESTAMP>] "GET /?q=node/add HTTP/1.1" 403 5507 "<WEBSITE>" "Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
Nine times out of ten, these requests consistently use the above flavor of Webkit / Safari user agent and they almost always use a different IP address, thereby making it difficult to fix.

What I tried to do is as follows:
RewriteCond %{HTTP_COOKIE} !cookievar
RewriteCond %{REQUEST_FILENAME} \.(gif|jpe?g|png|js|css|swf|php|ico|txt|pdf|xml)$ [NC]
RewriteRule .* - [L,co=cookievar:true:%{HTTP:Host}:86400]
RewriteCond %{HTTP_COOKIE} !cookievar
RewriteCond %{THE_REQUEST} (user\/register|node\/add)
RewriteRule .* - [F]
I'm not very great with HTACCESS code (as you may or may not tell from the above) but my intentions here were to force any browser coming to the site to store a cookie value if they can access my site assets, then I would use that cookie to validate if the visitor is an actual user. If they pass that, I let them through and onto the website. Otherwise, I stop them before they can use any server resources. It's my understanding that blocking a user at the HTACCESS level is akin to stopping them at the app server level (and not the app itself). So my virtue here would be the elimination of leeching CPU / RAM from the server, etc. and also stopping spammers.

Unfortunately, my logs indicate that it's not working like I was hoping it would. This is either because the code above doesn't work or else because the browser requests are automated and legit browser visits that store cookies. I'm hoping that someone on here might have some suggestions or ideas about some of this? What I'd love to do is block all requests that can't store my cookie and make GET requests to the relative locations user/register or node/add completely inaccessible unless they have that cookie. This won't block people who might automate their browsers, but that I would attack later on.

Insights would be appreciated.
User avatar
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Using HTACCESS to block variable IP bots...

Post by requinix »

It should be easy to check if they're hitting other pages/resources: look for other requests from the same IP address around that time.
Post Reply