Page 1 of 1

Protecting a site against bot programs

Posted: Sat May 04, 2013 9:12 pm
by califdon
After dealing with a hacker attack on one of my domains for more than a week (the attacks have ceased, so either I fixed it or he got tired of the game), and receiving some very useful comments and suggestions from others on this and another forum, I have had some thoughts of my own about protective measures that might be (should be?) taken to reduce the exposure of database entry forms to robot programs. I must begin by stating that I am not a web security expert. I have created quite a few web interfaces for databases and am an experienced programmer in PHP, Javascript, and the web page protocols, but this approach that I'm going to describe is just something that seems to me to offer some protection, it's not the result of any in-depth security experience. Indeed, I hope that some real security gurus will critique my suggestion and point out flaws in my reasoning or execution.

So what I am trying to protect against is the kind of a bot program that scans your HTML and recognizes a form element, obtains the action script path and filename, then recognizes each input element and its name, then it will perhaps select some or all of your form fields and construct an HTTP POST request to be sent to the action script with its own malicious data. I used to think that a decent CAPTCHA element would pretty well protect against this kind of attack, but I've now been advised that most CAPTCHA methods have been compromised by hackers, and that seems to be what happened in my recent case. There are certainly several useful techniques that will strengthen your security; you could check for IP addresses that are on the DNS Blacklists (although if botnets are being used, the "innocent" host computers that send out these are generally not on any black list yet); you can validate data, both on the client side and especially on the server side, but this isn't always very effective, depending on what kind of data fields you expect from a legitimate human filling in the form. I'm sure there are others, as well. But I came up with the following idea.

In the case of such a bot program. what if your form script submits its form using a Javascript onSubmit() function that manipulates your form data and field names before sending the HTTP POST request? Since we're protecting against a robot program, not a human hacker, it's not important that a human could read your Javascript code and figure out what you are doing, because there is unlikely to be a human scanning your code. Here is an example of what you might do.

Basically, instead of having a "submit" type Input element in your form, just have a "button" type that calls your own "submit" function in its onClick event, in which you perform any data validation you need to do, then switch around some of the values in the form fields, including at least one "hidden" form field, prior to issuing the Javascript command to submit the form. This means that your PHP action script (perhaps the same script as contains your form) can test for these manipulated values in the $_POST array. If they don't correspond to what you expect, you know that this input did NOT result from someone using your data entry script and its Javascript custom submit function.

Code: Select all

<!DOCTYPE HTML>
<html>
<head>
   <title>Test</title>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF8" />
   <style type='text/css'>
      .REQD    {
               border:1px solid red;
               }
   </style>
   <script type='text/javascript'>
      function myreset(field) {
         if(field.value!='') field.style.borderColor='green'
      }
      
      function mysubmit() {
         if(document.myform.nm.value=='' || document.myform.pw1.value=='' 
               || document.myform.pw2value=='' || document.myform.yb.value=='') {
            alert("You didn't complete all the fields")

         } else if(document.getElementById('pw2').value != document.getElementById('pw1').value) {
            alert("Your 2 passwords must match")

         } else if(document.getElementById('yb').value < 1900 || document.getElementById('yb').value > 2000) {
            alert("Invalid year of birth")

         } else {
            /* Here is where the entered value for 'yb' gets shifted to the hidden input 'val' */

            document.getElementById('val').value = document.getElementById('yb').value
            /* and is replaced by 'OK' */
            document.getElementById('year').value = 'OK'

            /* Now submit the form  */
            document.forms['myform'].submit()

            /* You may want to redirect to another page here, so as not to confuse the user */
            alert("Form submitted")
         }
      }
   </script>
</head>

<body>

<?php
if(isset($_POST['nm'])) {  
   /* If any field is blank, set its value to "BAD"  */
   $name = isset($_POST['nm']) ? $_POST['nm'] : "BAD";
   $pwd = isset($_POST['pw1']) ? $_POST['pw1'] : "BAD";
   /* Note that the actual year data should NOT be in the 'year' field, but in the 'val' field  */
   $birth = isset($_POST['val']) ? $_POST['val'] : "BAD";
   $val = isset($_POST['year']) ? $_POST['year'] : "BAD";
   
   echo "<br />";
   $sql = "INSERT INTO mydata (nm, pw, yob) VALUES ('$name', '$pwd', '$birth') LIMIT 1";
   echo "<br />$sql";
   /* connect to database, execute SQL INSERT statement */
   echo "<br />Database updated.";
} else {
?>

   <form name='myform' id='myform' method='post' action=''>
      <div>Enter your name: <input type='text' class='REQD' name='nm' id='nm' onBlur='myreset(this);' onChange='myreset(this);' /></div>
      <div>Enter your email address: <input type='password' class='REQD' name='pw1' id='pw1' onBlur='myreset(this);' onChange='myreset(this);' /></div>
      <div>Repeat the email address: <input type='password' class='REQD' name='pw2' id='pw2' onBlur='myreset(this);' onChange='myreset(this);' /></div>
      <div>What year were you born? <input type='text' class='REQD' name='year' id='year' onBlur='myreset(this);' onChange='myreset(this);' /></div>
      <input type='hidden' name='val' id='val' value='BAD' />
      <div><input type='button' name='sub' id='sub' value='Submit' onClick='mysubmit();' /></div>
   </form>

</body>
</html>
<?php
}
?>
It's a fairly simple thing to do. What have I missed? I know the code works, but how effective is it?

Re: Protecting a site against bot programs

Posted: Sun May 05, 2013 12:16 am
by twinedev
First I like using a honeypot (for those that don't know, it is a form field that is styles to not be visible on the screen, that when you submit needs to be blank. A lot of bots fill something out to each field it finds.) To help make sure it gets filled by bots, I usually name it "URL".

Second, after putting this method in place to stop a specific bot attack on a site, I have pretty much kept it in forms. I add a field like "form_hash" that contains the timestamp that the form was displayed. Then when you submit the form, it checks to make sure it is within a certain time period (usually like 30 minutes, but depends on how much info is on the page, how complex the form is). The idea is that if they scan the form to figure out how to just do POST's back for it, it will only be valid for that amount of time. Now if you did it just as the timestamp value, people may recognize that "form_hash" is a timestamp, so I usually use my int2key/key2int functions to mask them (see viewtopic.php?t=132062 for those functions)

Third that I gave a try on one site required a session use, was that every time the form displayed, the field names changed. There was an array of field names, numbered 0-whatever. I then would generate a random number that was higher than whatever. This number would get saved to the session, and then for the form field names, add it to the fieldid number, again run through int2key(). (i never like just displaying "raw numbers", can you tell), then when form is submitted grab the offset number off of session, and use that to get the correct fieldid number to know where to assign the value to.

Lastly for "contact us" type forms, where damn SEO "firms" send messages to advertise their service. I'm adding "country field" as required, (most of my clients have no realist contacts from non US people), then as long as they enter US, the IP needs to resolve back to US, and phone number must be a correct US format (most of the time they are way wrong format), and also search for key terms in the message for things like "rank" "seo" and some others (based upon the spams received). Then if something is tripped, instead of sending the form info, it kicks them back back to a second step, which adds a text filed that says along the lines of "I am not trying to solicit business from you. If I offer any services, I am doing them at 100% free of charge". It is in a locked input and back end looks for them to of changed it... still get maybe once a month some dumb*** will still submit it anyhow, but it is no longer every other day. oh, also if it tripped the second step, for the client, the subject line also indicates it so they can tell at first glance.

I do like your JS process though. One thing to consider is what do page readers (for blind) do with JS? ADA compliance is becoming bigger and bigger, so that is always on my mind with sites.

Re: Protecting a site against bot programs

Posted: Sun May 05, 2013 12:45 pm
by califdon
Thanks, Greg. Those are good techniques, too. The dynamic field names, using a session variable, is especially interesting to me; I'm going to review that idea and might start using that (do I have to pay you royalties? lol). I hadn't considered access issues; I have mixed feelings about that, for my purposes, but it's sure worth giving some thought.

I also posted the above exact post on http://www.stopforumspam.com/forum/view ... 513#p36513 and got several very good responses, so far (it's a good special purpose forum, btw). User "zero-tolerance" had these pertinent words:
But this is both an arms race and a numbers game. The spam bots are not presently very clever because they don't have to be. They're running a business, which is always about extracting the maximum profit from the minimum investment. While the mass of forums have so little protection it's not going to be worth their while breaking into those that have more. As the average difficulty of spamming forums goes up - as surely it will - so too will the cleverness of the spambots. It's not difficult to see how they could be made to understand hidden fields completely and run full javascript - or adobe flash for that matter. There's probably also a lot more they could do to avoid hitting honey pots or being listed here.

Re: Protecting a site against bot programs

Posted: Sun May 05, 2013 1:15 pm
by califdon
@twinedev: Just read your int2key/key2int post. Zowie! That will be useful! I will have to study it more, but I think I have several potential uses for that kind of encryption. Thanks.

Re: Protecting a site against bot programs

Posted: Sun May 05, 2013 9:18 pm
by twinedev
thanks. I know it is overkill for some aspects, but I like making things difficult to "guess" how to get back to things, and less guessable that something is a number that can be used to manipulate a site by changing it, either being an ID or timestamp. What can I saw, I know what I would do if I was trying to mess with a site, and try to counter what I would do.

Re: Protecting a site against bot programs

Posted: Mon May 06, 2013 12:09 pm
by califdon
twinedev wrote:What can I say, I know what I would do if I was trying to mess with a site, and try to counter what I would do.
I better stay on your good side, Greg! Remember, I am your friend! :rofl:

Re: Protecting a site against bot programs

Posted: Tue May 07, 2013 12:10 pm
by twinedev
lol, I have too guilty of a conscience to try anything. Heck, for a while I was getting basically the same two spam messages every day, that had spoofed links that actually took you to a hacked Wordpress Install. One day, it was a site that didn't disable directory browsing, so i was able to see and grab all the hack scripts they put on the site (one of them was a copy of c99shell... so that let me download them). Anyhow, the main file that is linked to in emails was main.url/path/track.php?some=parameter Well this script will kick people off to other sites, usually that was nothing more than a java applet...

So anyhow, having the source to the track.php file, I found it had a "configuration" mode that let me change where it sent people to, so, being the helpful geek I am, I would break it from going where they wanted, so others who didn't know better and clicked on the link in their email would be "safe(er)"

If you ever see a link like it, (most of them came in as spoofed LinkedIn e-mails) try changing the query string to ?mode=config&key=gfinberw8gjyu9djru47slbn47quf8oytuh7gdrs The hack usually uses track.php, wp-secure.php, wps.php, or wp-status.php ... That query string worked for them all, however later hacks, instead of properly re-writing the script with the "data" line, it would actually write the file wrong, and thus make it crash with a PHP error (just as good for my purpose).

Re: Protecting a site against bot programs

Posted: Tue May 07, 2013 1:02 pm
by califdon
Cool! I'll be on the lookout for that signature. I usually just delete any obvious spam that I spot, but I will admit that if I thought I could do something to disrupt an operation like that, I would enjoy doing it!

I should mention Project Honey Pot that is a distributed cooperative trap operation that I participate in. I have a one-line trap in one script in each domain I have. They report back when one of my honey pots has led to the blacklisting of another IP address; it happens every month or so. It doesn't take any maintenance, just a one-time insertion of a line of code in one script per domain name. You can also "donate" an MX record for a domain that allows them to collect spam email and identify the spammers.