Page 1 of 1

preg_replace unescapes escaped content

Posted: Fri Sep 10, 2010 6:02 am
by MichaelR
I'm not sure if this is known at all, but it seems that preg_replace unescapes escaped back-slashes. See below:

Code: Select all

  // PHP 5.3.3

  $one = 'This is escaped: ?';
  $two = '\"';

  // Returns (expected): This is escaped: \\\"
  str_replace('?', mysql_real_escape_string($two), $one);

  // Returns (unexpected): This is escaped: \\"
  preg_replace('/\?/', mysql_real_escape_string($two), $one);

Re: preg_replace unescapes escaped content

Posted: Fri Sep 10, 2010 11:20 am
by John Cartwright
I'm not sure why you are mixing mysql_real_escape_string() with preg functions. See preg_quote() for proper usage.

Re: preg_replace unescapes escaped content

Posted: Fri Sep 10, 2010 11:39 am
by MichaelR
The reason behind it was so that I could call functions like this:

Code: Select all

  escapeData('username = ? AND password = MD5(?) AND email = ?', $_POST['username'], $_POST['password'], $_POST['email']);

  // Returns: username = 'O\'Connor' AND password = MD5('pA55w0Rd?') AND email = '\"michael\".rushton@example.com'
And thanks for the preg_quote() reference. It's exactly what I needed to solve the issue.

Note: I'd use str_replace except it cannot be limited. I need it to stop after the first replace.

Edit: Except that now escapes: + * ? [ ^ ] $ ( ) { } = ! < > | : -. :banghead:

2nd edit: used strpos() and substr_replace() to emulate a limited str_replace().

Re: preg_replace unescapes escaped content

Posted: Sun Sep 12, 2010 5:49 am
by kaisellgren
Looks very insecure. You are doing this for fun, right?
MichaelR wrote:Note: I'd use str_replace except it cannot be limited.
No? So, the fourth parameter is some kind of bogus doc param?

Re: preg_replace unescapes escaped content

Posted: Sun Sep 12, 2010 12:07 pm
by MichaelR
It's very secure now that I've managed to resolve the issue. And the fourth parameter of str_replace() simply defines the variable which will reference the number of replacements. It doesn't limit them. It's the fourth parameter of preg_replace() that can be used to stop the replacement after the nth time (but preg_replace() needed preg_quote() to function which escaped characters I didn't want escaping. The function (or, rather, method(s)) is (/are) now:

Code: Select all

    final private function _charEncode($string) {
      return str_replace(array('&', '?'), array('&', '&#63;'), $string);
    }

    final private function _charDecode($string) {
      return str_replace(array('&#63;', '&'), array('?', '&'), $string);
    }

    final public function escapeData($value, $expression = '?') {

      $escape = !is_array($value) ? array($value) : $value;

      foreach ($escape as $value) {

        if (!$value = addcslashes(mysql_real_escape_string($value, $this->_connection), '%_')) {
          throw new Exception('Unable to escape the data.');
        }

        if (($start = strpos($expression, '?')) !== false) {
          $expression = substr_replace($expression, "'" . $this->_charEncode($value) . "'", $start, 1);
        }

      }

      return $this->_charDecode($expression);

    }
If there are any security issues/problems that may arise from this, I'd appreciate the criticism.
kaisellgren wrote:You are doing this for fun, right?
Partly for fun, partly to develop my knowledge and understanding of PHP, and partly because I want to write a very good MySQL Database class (yes, I know of MySQLi and PDO, but I want to do a MySQL Database class (partly for fun, partly to develop my knowledge and understanding of PHP)).

Re: preg_replace unescapes escaped content

Posted: Sun Sep 12, 2010 1:43 pm
by kaisellgren
MichaelR wrote:And the fourth parameter of str_replace() simply defines the variable which will reference the number of replacements.
Oh, you're right. I recalled str_replace() being capable of limiting replacements. Anyway, if criticism is okay to you, I have formed my thoughts.

I wouldn't call your class secure. It's also buggy. For example, it throws backslashes in front of % and _ characters. That easily corrupts data. You probably thought about the meta characters that MySQL has for LIKE expression (and maybe something else too). You should realize though that this affects negatively everything else. Besides, it only escapes _ and % not \. Worse yet, it does that after escaping values, and without any given character set (so it uses ISO-8859-1 in PHP 5, and Unicode in PHP 6) which often luckily works though.

And then, to emulate prepared statements, you keep replacing contents of expressions iteratively which easily leads to vulnerabilities and errors. The whole thing should be tokenized and then parsed per token. It's also not good to alter data after escaping it, because it also easily leads to vulnerabilities. And in the end, you transform your data again which is bad, too.

And as always, a system cannot be called secure if there are no written tests for it and if it has not been thoroughly evaluated by other security conscious people.

Anyway, security is an area fun to play with, so have fun!

Re: preg_replace unescapes escaped content

Posted: Sun Sep 12, 2010 3:29 pm
by MichaelR
kaisellgren wrote:Besides, it only escapes _ and % not \.
It does escape \. I used mysql_real_escape_string() as well as addcslashes().
For example, it throws backslashes in front of % and _ characters. That easily corrupts data. You probably thought about the meta characters that MySQL has for LIKE expression (and maybe something else too). You should realize though that this affects negatively everything else.
In what way? Surely escaping characters that are to be input into an SQL query is a good thing?
And then, to emulate prepared statements, you keep replacing contents of expressions iteratively which easily leads to vulnerabilities and errors. The whole thing should be tokenized and then parsed per token. It's also not good to alter data after escaping it, because it also easily leads to vulnerabilities. And in the end, you transform your data again which is bad, too.
Could you give an example of when it would cause a problem? Because as far as I can tell, using addclashes() on top of mysql_query() isn't going to run into conflict (unless I use addcslashes() to also esacape \ (which I haven't)). And _charEncode() and _charDecode() are each other's reverse, so the data put in will be exactly the same as the data put out.

But I will take your advice and tokenize the string (which should actually allow me to not need the *Encode and *Decode methods).
And as always, a system cannot be called secure if there are no written tests for it
True. I was simply referring to it being able to pass the examples I gave it (but, then again, I'm no security expert, so I can't be sure I've thought of every eventuality).

Re: preg_replace unescapes escaped content

Posted: Sun Sep 12, 2010 5:03 pm
by MichaelR
Okay, here's the modified version:

Code: Select all

    final private function _escapeData($value, $expression = '?') {

      $token = strtok($expression, '?');

      $count = substr_count($expression, '?');

      $expression = '';

      foreach ($value as $key => $escape) {

        if (!$escape = addcslashes(mysql_real_escape_string($escape, $this->_connection), '%_')) {
          throw new Exception('Unable to escape the data.');
        }

        if ($key < $count) {
          $token .= "'" . $escape . "'";
        }

        $expression .= $token;

        $token = (isset($value[$key + 1])) ? strtok('?') : strtok('');

      }

      return $expression .= $token;

    }

Re: preg_replace unescapes escaped content

Posted: Mon Sep 13, 2010 2:15 am
by VladSun
I'd prefer using the {N} style over the ? one.
I've posted it here:
viewtopic.php?f=50&t=108910

I think the mysql_real_escape_string() is just enough for escaping values - no need to addslashes().

Re: preg_replace unescapes escaped content

Posted: Mon Sep 13, 2010 2:22 am
by MichaelR
VladSun wrote:I think the mysql_real_escape_string() is just enough for escaping values - no need to addslashes().
I was taking into account the possibility of users (un)intentionally inputting wildcards (% and _) to data that is to be used in a LIKE query.

I like the idea of using {N} instead of/as well as using ?. Thanks for the suggestion.

Re: preg_replace unescapes escaped content

Posted: Mon Sep 13, 2010 2:54 am
by VladSun
MichaelR wrote:
VladSun wrote:I think the mysql_real_escape_string() is just enough for escaping values - no need to addslashes().
I was taking into account the possibility of users (un)intentionally inputting wildcards (% and _) to data that is to be used in a LIKE query.
Oh ... OK :) But what will happen if such data (containing _ % ) is absolutely legal for other inputs?

Re: preg_replace unescapes escaped content

Posted: Mon Sep 13, 2010 11:35 am
by MichaelR
VladSun wrote:But what will happen if such data (containing _ % ) is absolutely legal for other inputs?
I don't understand. I'm not removing the characters or making them illegal; just escaping them to prevent them from being used as wildcards. They'll just be treated as normal. I suppose the only problem is if I want to allow the user to have control over whether or not there's a wildcard. But that can be resolved in other ways.

Re: preg_replace unescapes escaped content

Posted: Mon Sep 13, 2010 11:54 am
by kaisellgren
VladSun wrote:Oh ... OK :) But what will happen if such data (containing _ % ) is absolutely legal for other inputs?
It corrupts the data..
MichaelR wrote:I don't understand. I'm not removing the characters or making them illegal; just escaping them to prevent them from being used as wildcards. They'll just be treated as normal. I suppose the only problem is if I want to allow the user to have control over whether or not there's a wildcard. But that can be resolved in other ways.
% and _ serve a special behavior only in a few cases. In all other cases they are just normal bytes that are not to be escaped.

It's almost like trying to prevent issues rising from placing user input in a REGEXP expression.
MichaelR wrote:addclashes() on top of mysql_query() isn't going to run into conflict (unless I use addcslashes() to also esacape \ (which I haven't)).
There is a difference between your "escaping" mechanism and the one MySQL has. mysql_real_escape_string() makes a function call to the C mysql_real_escape_string() that in turn behaves based on the character encoding. Your scheme always uses ISO-8859-1 or Unicode depending on the PHP version. When the encoding between yours and the database differs there will be a door open for potential vulnerabilities.
MichaelR wrote:And _charEncode() and _charDecode() are each other's reverse, so the data put in will be exactly the same as the data put out.
Encode and decode cancel each other out, great. But what is wrong is that they perform after escaping (the encoder part).

One thing to remember is to never alter data after escaping it with native functions like the mysql_real_escape_string(). Otherwise you can easily break what the native function did unless you clearly understand what the native function does and you are aware of what you are doing.

By the way, the tokenizing part seems better now -- did not test anything, never tested anything, but just saying it looks better to the bare eye.