Page 1 of 2

Holes in preg_match filters

Posted: Sat Apr 07, 2007 5:08 am
by matthijs
Did you guys read Stefan Essers post Holes in preg_match filters?

Don't know about you, but this is new for me and I'll have to do some research to see how this affects my own scripts. Many regexes are affected by this (even examples in Shifletts security book).

Posted: Sat Apr 07, 2007 8:04 am
by feyd
Heh, I usually trim() (or rtrim()) all incoming lines anyways. :)

Posted: Sat Apr 07, 2007 8:49 am
by Oren
I don't see why you call it "Holes in preg_match filters" :?
It's not a hole at all, this is how PHP parse such regex. Althought I knew that from the day I learnt regex's, I've never used the D modifier :P

Posted: Sat Apr 07, 2007 10:33 am
by matthijs
It's not my words. Stefan Esser calls it that in his post.

I think what Stefan means is that there are holes in the filters, built using preg_match. Of course it's a human fault when a regex is not done correctly, but if almost every tutorial out there and even an example in Chris Shifletts book explains it wrong (as far as I can see/understand), then many people learn and apply it wrong.

Security is not easy ...

Posted: Sat Apr 07, 2007 10:41 am
by Oren
matthijs wrote:Security is not easy ...
Certainly not.

Posted: Sat Apr 07, 2007 12:01 pm
by Ambush Commander
The vulnerability described would certainly be tough to exploit. While Esser correctly points out that in things like headers, newlines are important, you would need something that looked like:

Code: Select all

// php 4 example, I know this was fixed in php5
$location = validate($location, 'url');
$extra_variable = validate($extra_variable, 'something');
header("Location: $location $extra_variable");
Maybe it'd be more likely in mail.

Regardless, it does raise some concerns with regards to data consistency. If your application expects input to have only those set characters, newlines could cause problems. I think Feyd's method of trim()'ing data is the best.

There are worse things out there, however, that result from lack of user education. Example:

Code: Select all

$data = false;
if (isset($_GET['data']) && $_GET['data'] == 2) {
  $data = $_GET['data'];
}
Which would be vulnerable to XSS via ?data=2<script>alert('xss');</script> due to PHP's typecasting.

(Yes, I know, you should be casting $data to int).

Posted: Sat Apr 07, 2007 4:12 pm
by veridicus
The issue is with an understanding of security issues overall. With a solid security education it's more obvious what's risky in regex and any other code. I don't think this regex "hole" is a huge problem.

Posted: Sun Apr 08, 2007 3:39 pm
by shiflett
Hi matthijs,

If you're aware of any errors in my book that are not listed in the errata, please let me know. I want to make sure it remains current. (To date, I'm only aware of typographical errors.) Not only do I strive to provide accurate information in the book, I want to aggressively highlight any errors that are discovered. Also, I want to immediately correct any errors in the code repository, if they exist.

My examples tend to use alternatives to regular expressions in order to focus on the practice and not the pattern. This is because there are people who can better explain regular expressions, and there are entire books on the subject.

I must admit that this particular "hole" isn't very interesting to me. The concern is the ability to include a single newline after the subject string. Nothing else. Ambush Commander points out a much more dangerous and realistic problem. Stuff like that scares me, because it's easy to see how someone (myself included) could make the same mistake.

Posted: Sun Apr 08, 2007 5:49 pm
by dreamscape
shiflett wrote:I must admit that this particular "hole" isn't very interesting to me. The concern is the ability to include a single newline after the subject string. Nothing else.
I could not agree more. Sometimes I think Stefan Essers likes to complain about things just for the sake of complaining, especially since his falling out with the PHP dev team.

Even in the exploits he cites where a new line character *can be* dangerous, an extraneous new line at the end of the subject cannot be dangerous unless you are doing something really off the wall, because anything after the newline, which would be required to carry out exploits like that, would cause the PCRE expression to fail.

Posted: Sun Apr 08, 2007 6:38 pm
by Ollie Saunders
Which would be vulnerable to XSS via ?data=2<script>alert('xss');</script> due to PHP's typecasting.
omg why did that never occur to me before?!
tuff like that scares me, because it's easy to see how someone (myself included) could make the same mistake.
$numScared++;

Posted: Mon Apr 09, 2007 3:08 am
by matthijs
Chris: thanks for your response. My wording was probably a bit off. I was just assuming that what Stefan pointed out was indeed a real and maybe even a big vulnerability.

After reading Ambush and your comments and looking at it a bit closer, it seems indeed to be blown out of proportion a bit. I didn't do a thorough test myself, I just wanted to know other peoples opinion about his post.

feyds suggestion of using trim() in situations like this seems like a good one. Or adding the /D modifier of course.

The fact that it's "only" a single new line character which could be injected didn't get to me immediately. I had visions of long strings of javascript being injected :wink:
So if I test it like this:

Code: Select all

$clean = array();
   if (preg_match("/^[0-9a-zA-Z]+$/", $_GET['var'])) {
      $clean['var'] = $_GET['var'];
   }
Then if someone GETs

Code: Select all

// pregmatchtest.php?var=TEST%0a
$clean['var'] is now 'TEST ' (newline at the end)
Then if I would somehow concatenate the clean['var'] with another var, I'd end up with:

Code: Select all

$new_var = $clean['var'] . other_var;
// now $new_var is:
TEST
other_var
Which could in some situations be unwanted. (basically the same example as the one from Ambush.)

Posted: Mon Apr 09, 2007 4:54 am
by Ollie Saunders
not quite matthijs. Anything appearing after the linefeed will get through.

Posted: Mon Apr 09, 2007 6:14 am
by shiflett
ole wrote:Anything appearing after the linefeed will get through.
That's not true. Did you try it?

Posted: Mon Apr 09, 2007 8:08 am
by Chris Corbyn
shiflett wrote:
ole wrote:Anything appearing after the linefeed will get through.
That's not true. Did you try it?
We appear to have lost your avatar Chris :? :oops:

Posted: Mon Apr 09, 2007 8:31 am
by Ollie Saunders
shiflett wrote:
ole wrote:Anything appearing after the linefeed will get through.
That's not true. Did you try it?
Have now. I stand corrected