Page 1 of 2

SQL Injection prevention

Posted: Tue Sep 18, 2007 1:43 pm
by The Phoenix
I'm looking to contribute to an open source project, and improve their security processes for database calls.

Before I do, I want to be absolutely sure I understand all the potential gotchas before giving bad advice.

So, to the community at large I ask the question:

What php code needs to be run on query text prior to sending to a database to prevent only sql injections.

Notably, the issues of XSS, CSRF, etc are all handled in a very different space. I'm just interested in preventing sql injections.

So, on mysql, is it as simple as:

mysql_real_escape_string($query)

Or is there more that must be done?

Please be specific with examples of items that will not handle properly, and ideally, the correct code/method to handle them properly.

Thanks gang. :)

Posted: Tue Sep 18, 2007 1:45 pm
by s.dot
mysql_real_escape_string() should do it. :)

However, I'm all for validating data before it goes into the database.

Posted: Tue Sep 18, 2007 2:03 pm
by Christopher
I would at minimum filter the incoming data. You can also validate the input and reject requests that are not well formed.

Posted: Tue Sep 18, 2007 2:14 pm
by The Phoenix
arborint wrote:I would at minimum filter the incoming data.
Do you feel that mysql_real_escape_string accomplishes that effectively?
arborint wrote:You can also validate the input and reject requests that are not well formed.
This is for a database library, where the library doesn't have awareness of the input format, or the request types. Those activities would have to take place outside of the library itself, although I agree that they are important and need to be done.

Posted: Tue Sep 18, 2007 3:28 pm
by mrkite
Use PEAR:DB, PEAR:MDB2, or pdo and use prepared statements.

No need to worry about escaping.

Posted: Tue Sep 18, 2007 3:54 pm
by Mordred
Nope, mysql_real_escape_string() is not enough, here's why (with plenty of examples):
http://www.webappsec.org/projects/articles/091007.shtml

Generic filtering of input is good, but it works on another level, the DB-aware layers should concentrate on proper syntax and escaping.

Posted: Wed Sep 19, 2007 4:13 am
by stereofrog
Mordred wrote:Nope, mysql_real_escape_string() is not enough, here's why (with plenty of examples):
http://www.webappsec.org/projects/articles/091007.shtml
Nice stuff, thanks. Most people don't realize that escaping only protects the value parts of the queries.

Posted: Wed Sep 19, 2007 4:15 am
by matthijs
Indeed. That was an educational read.

Posted: Wed Sep 19, 2007 6:51 am
by superdezign
I don't believe that you should run in on the entire query, anyway... Then you'd escape everything instead of only the data.

Posted: Thu Sep 20, 2007 9:47 am
by The Phoenix
Mordred wrote:Nope, mysql_real_escape_string() is not enough, here's why (with plenty of examples):
http://www.webappsec.org/projects/articles/091007.shtml

Generic filtering of input is good, but it works on another level, the DB-aware layers should concentrate on proper syntax and escaping.
Thats an excellent article about the full lifecycle of validation and escaping, but I'm struggling with a few issues.

Starting from #4 down, #4 highlights an issue with validation - not escaping. Further, even you state:
The reader must realise that input validation (in our case making sure that what we expect to be an int is really int) and escaping the parameter before giving it to the query are two different security steps. In this particular case either one will suffice, but in general, for in-depth security, you must always do both. Also, the two tasks will most probably be carried by two different subsystems in your real-life code, so the validating and escaping code will not be adjacent as displayed in this simplified case.
I don't think a database abstraction layer has sufficient awareness to be able to determine whether an item is intended to be a string or an int, and as a result, can't reliably perform the validation. That has to be performed in a different subsystem, correct?

#5 is a variation on the same issue, albeit with a much more harsh bite. Nevertheless, it still puts the requirement on validation, which the db abstraction layer isn't sufficiently informed to do.

#6 is validation prevented as well, this time with hex instead of ints.

#8 is covered, because the db abstraction layer in question prevents multiple queries in a single call.

So that really leaves #7 as an issue that the db abstraction layer can/should effectively work to prevent, is that correct?

#7 is easy for me to understand, but I'm stumbling on understanding the proper prevention. I guess I'm lost because of the line:
Test 3 demonstrates that in our case we don't need to escape the LIKE escape character, as mysql_real_escape_string() already escaped it.
Even though above that it showed mysql_real_escape_string did NOT escape it. I'm sure I'm just not understanding effectively.

So, with validation not something the db abstraction layer can effectively provide, is there more that needs to be done to escape the query than just mysql_real_escape_string (at the db abstraction layer only)?

I suspect from #7 that there is. Perhaps if we detect a % sign (or another wildcard for LIKE), we run addcslashes?

Thanks for the informative post!

Posted: Thu Sep 20, 2007 3:44 pm
by Mordred
The Phoenix wrote:
Mordred wrote:Nope, mysql_real_escape_string() is not enough, here's why (with plenty of examples):
http://www.webappsec.org/projects/articles/091007.shtml

Generic filtering of input is good, but it works on another level, the DB-aware layers should concentrate on proper syntax and escaping.
Thats an excellent article about the full lifecycle of validation and escaping, but I'm struggling with a few issues.

Starting from #4 down, #4 highlights an issue with validation - not escaping. Further, even you state:
The reader must realise that input validation (in our case making sure that what we expect to be an int is really int) and escaping the parameter before giving it to the query are two different security steps. In this particular case either one will suffice, but in general, for in-depth security, you must always do both. Also, the two tasks will most probably be carried by two different subsystems in your real-life code, so the validating and escaping code will not be adjacent as displayed in this simplified case.
I don't think a database abstraction layer has sufficient awareness to be able to determine whether an item is intended to be a string or an int, and as a result, can't reliably perform the validation. That has to be performed in a different subsystem, correct?
In example 4 the issue is twofold, that's why it can be solved in two different ways. Either the application layer forces the value to be integer (i.e. we rely on validation), and we leave the field unquoted (which is a valid SQL syntax), OR (much better) we leave this to the database layer, by adding quotes (which is also valid SQL syntax, although maybe only for some DB systems, MySQL for sure) and doing escaping. The best ( I called it robust in the article, because it will continue to work even after source modifications) makes use of both techniques at the same time.

So yes, the validation is done on a different level (to emphasize this, I put an elipsis between the validation and escaping). As a side note, though, in the cases where we're talking about ints, the database layer is aware enough of the type of data it needs - after all the column type in the table is known.
The Phoenix wrote: #5 is a variation on the same issue, albeit with a much more harsh bite. Nevertheless, it still puts the requirement on validation, which the db abstraction layer isn't sufficiently informed to do.
On the contrary, see the above comment. The database layer knows where the variable will be put in the query, and it knows that it wants an integer. In a sense, casting to int is equivalent to escaping - all invalid characters (non-digits in our case) are "escaped" by being removed.
The Phoenix wrote: #6 is validation prevented as well, this time with hex instead of ints.
Oh, #6 means chapter 6 ... I thought is it Example 6. Ok.

No, it is not a validation issue, frankly it is the developer being an idiot ;) Alas, I've seen it happen in the wild both with column names and sort directions... Anyway, as in the case of ints above, this is not validation. The systems that handle input shouldn't know such intimate details about the database structure, it is the DB-layer's job to ensure that what it is being given is not invalid names or sorting keywords. Ideally, this shouldn't happen at all. I personally implement ordering instructions by encoding them with a number - $n/2 gives the column number, $n%2 gives ASC/DESC. In that way only valid column names and order keywords go in the dynamic query.

Also the hex encoding is just a way to use the vulnerability without quotes, it goes around the escaping mechanism, not the validation mechanism. An attacker could also use MySQL string functions (CHAR/CONCAT) to construct string values,

Again, the fault is not in the lack of validation! I very much dislike the idea that secure code need validation, but that's a topic for another rant.
The Phoenix wrote: #8 is covered, because the db abstraction layer in question prevents multiple queries in a single call.

So that really leaves #7 as an issue that the db abstraction layer can/should effectively work to prevent, is that correct?

#7 is easy for me to understand, but I'm stumbling on understanding the proper prevention. I guess I'm lost because of the line:
Test 3 demonstrates that in our case we don't need to escape the LIKE escape character, as mysql_real_escape_string() already escaped it.
Even though above that it showed mysql_real_escape_string did NOT escape it. I'm sure I'm just not understanding effectively.

So, with validation not something the db abstraction layer can effectively provide, is there more that needs to be done to escape the query than just mysql_real_escape_string (at the db abstraction layer only)?

I suspect from #7 that there is. Perhaps if we detect a % sign (or another wildcard for LIKE), we run addcslashes?

Thanks for the informative post!
I admit that the wording of that paragraph could be clearer. It describes the highly theoretical scenario of using another escape character for the LIKE statement. The "LIKE escape character" is backslash (\), and mysql_real_escape_string() does escape it. The reason I mention it, is because if we choose another escape character, lets say '*', and we use addcslashes, the attacker could give us a query containing the new escape character:

Code: Select all

$search = '*'; //attack
$search = mysql_real_escape_string($search);
$search = addcslashes($search, "%_");
RunQuery("SELECT * FROM `table` WHERE `column` LIKE '$search*%' ESCAPE '*' "); //we want to search for values ending in '%'.
What would happen is that the injected '*' will escape the constant '*', and the '%' sign will lose its escaping, and will start to act as a wildcard. As I said this is highly theoretical (haven't seen it done in the real world) and it is much milder than being able to inject wildcards in the beginning of a LIKE parameter.


Edit: To reemphasize my answer to that question:
So, with validation not something the db abstraction layer can effectively provide, is there more that needs to be done to escape the query than just mysql_real_escape_string (at the db abstraction layer only)?
All prevention steps are done entirely in the database layer. It knows all information it needs to do so. No fancy-shmancy validation is needed to have secure database code.

just keep the foe...

Posted: Tue Oct 02, 2007 1:49 am
by junix
just learning here !

Posted: Wed Oct 10, 2007 7:27 pm
by Josh1billion
Very good read, Mondred, it helped me out. :)

Posted: Wed Oct 17, 2007 4:37 pm
by arpowers
my mysql_prep function...

please let me know if I need to improve it:P

Code: Select all

function mysql_prep( $value ) {
		$magic_quotes_active = get_magic_quotes_gpc();
		$new_enough_php = function_exists( "mysql_real_escape_string" ); // i.e. PHP >= v4.3.0
		if( $new_enough_php ) { // PHP v4.3.0 or higher
			// undo any magic quote effects so mysql_real_escape_string can do the work
			if( $magic_quotes_active ) { $value = stripslashes( $value ); }
			$value = mysql_real_escape_string( $value );
		} else { // before PHP v4.3.0
			// if magic quotes aren't already on then add slashes manually
			if( !$magic_quotes_active ) { $value = addslashes( $value ); }
			// if magic quotes are active, then the slashes already exist
		}
		return $value;
	}

Posted: Wed Oct 17, 2007 5:31 pm
by John Cartwright

Code: Select all

function mysql_prep( $value ) {
      if (get_magic_quotes_gpc()) {
         $value = stripslashes($value);
      }

      //post php 4.3
      if (function_exists('mysql_real_escape_string')) {
         return mysql_real_escape_string($value);
      }
      //pre php 4.3
      else {
         return mysql_escape_string($value);
      }
   } 
}
You can change the addslashes() to mysql_escape_string(), generally you want to avoid addslash()'ing data into your db.