ctype_print for email injection prevention

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

User avatar
shiflett
Forum Contributor
Posts: 124
Joined: Sun Feb 06, 2005 11:22 am

Post by shiflett »

matthijs wrote:The examples are from http://securephp.damonkohler.com/index. ... _Injection.
Glancing through this article, it looks like the encoded values are used as an example of injecting GET data. The actual data is not encoded.

In other words, someone might visit your site with a URL similar to the following:

Code: Select all

http://host/email.php?headers=From:+sender@example.org%0ABCC:+victim@example.org
In this case, $_GET['headers'] does not contain %0A. That's just the encoded representation of a newline in the context of a URL. Your code does not need to check for %0A. Doing so is useless.
matthijs wrote:My guess is that it has something to do with the php settings on my server, because your example doesn't either produce the results I expect.
I meant to point out that it doesn't work in order to help clarify the issue. Sorry if I just made things worse. :-)

The following should be a working example:

Code: Select all

<?php

mail('anyone@example.org',
     'My Evil Test',
     'My Evil Message',
     "From: sender@example.org\nBCC: you@example.org");

?>
I hope this was more helpful. :-)
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Hi Chris,

Yes, your post was helpful!
Glancing through this article, it looks like the encoded values are used as an example of injecting GET data. The actual data is not encoded.
I suspected it was something like this. Thanks for clarifying that.
Sorry if I just made things worse. :-)
No need to be sorry. Even though I didn't understand everything right away after your other response, that doesn't mean it wasn't helping me.

I will go and experiment some more with your latest example and reread this thread, then it'll be very clear.


It's incredible how much help you and others are offering here. I hope you know how much that is appreciated!

Matthijs
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Well, I did some experiments and will summerize it a bit.

What I did was make a simple testscript:

Code: Select all

<?php

$headers = '';

if (isset($_GET['headers']))
{

  $headers = $_GET['headers'];
  echo 'Headers: ' . $headers . '<br> ';
  
  if (!ctype_print($headers) )
  {
      echo '<p>Sorry, thats no good input!</p>';
  } 

  if (!check_email_address($headers))
  {
    echo '<p>Sorry, thats no valid emailaddress!</p>';
  }
}
?>
(in which check_email_address is the function from ilovejackdaniels)

I tested all the examples from the article http://securephp.damonkohler.com/index. ... _Injection. Result: with every example both tests return negative (so they do catch the bad input).
In this case, $_GET['headers'] does not contain %0A. That's just the encoded representation of a newline in the context of a URL. Your code does not need to check for %0A. Doing so is useless.
Chris was right, if I look at the outputted source code from one of the examples, I get this:

Code: Select all

haxor@attack.com
Content-Type:multipart/mixed; boundary=frog;
--frog
Content-Type:text/html
So, it seems that I can confirm what Chris already said, that using ctype_print() can be used to check for an emailinjection attempt. Of course, it can be said that using a good validation function, like the ones from ilovejackdaniels or Roja will be good enough. But an extra check can't be bad I think, certainly as this PHP function is so easy to use.

Also, I can imagine that one could take different actions depending on what kind of invalid data is received. As a 'user' posts 'only' an invalid emailaddress, one could return a nice friendly message telling that the emailaddress is not valid. On the other hand, if automatic, repetitive emailinjection attempts are made, the ctype_print() will catch it and one could return a 404, notify the admin of an attempt, etc.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

Rather than ctype_print(() I prefer to be explicit about exactly which characters I accept for untrusted data from $_*. I usually use preg_replace() but unlike the first post I use the "not in set" syntax. I like that it clearly documents what is being allowed (unlike simple "printable").

Code: Select all

<?php

$headers = '';

if (isset($_GET['headers']))
{
  // remove all characters except those allowed in email addresses
  $headers = preg_replace('/[^a-zA-Z0-9\@\.\_\-]/', '', $_GET['headers']);
  echo 'Headers: ' . $headers . '<br> ';
  
  if (!check_email_address($headers))
  {
    echo '<p>Sorry, thats no valid emailaddress!</p>';
  }
}
?>
Using the fairly standard Filter and Validator classes that can be found around simplifies this process even more.
User avatar
shiflett
Forum Contributor
Posts: 124
Joined: Sun Feb 06, 2005 11:22 am

Post by shiflett »

arborint wrote:Rather than ctype_print(() I prefer to be explicit about exactly which characters I accept for untrusted data from $_*. I usually use preg_replace() but unlike the first post I use the "not in set" syntax. I like that it clearly documents what is being allowed (unlike simple "printable").
I think matthijs is suggesting ctype_print() as a Defense in Depth measure, not a primary safeguard.
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

I think matthijs is suggesting ctype_print() as a Defense in Depth measure, not a primary safeguard
That's correct.

I have learned that one of the basic rules of validation is that it is better to use a whitelist approach then a blacklist approach. So that's the first thing I will do. If input should be an email format, I use a email-validation function, if it should be alphanumeric I use ctype_alnum(), etc etc.

In this case using ctype_print() was indeed meant as an extra layer of defense. If some day, I or someone else makes a change in a script and replaces the emailvalidation regex with some other, or changes what ends up in the headers of the mail() function, then I know there's still the ctype_print() function to prevent the email injection possibility.

Of course there are other possibilities, like the one arborint mentions (using preg_replace()). If someone knows, I do. My head is still spinning from reading hundreds of posts on multiple forums about the email injection problem :( I lost count on the amount of solutions being brought forward....

That's the whole reason I started this thread. To find out if using ctype_print() could be another good option. And while I agree there are or could be better functions or regular expressions, as a second layer of defense this function appeals to me because it's so simple to use.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

matthijs wrote:I have learned that one of the basic rules of validation is that it is better to use a whitelist approach then a blacklist approach. So that's the first thing I will do. If input should be an email format, I use a email-validation function, if it should be alphanumeric I use ctype_alnum(), etc etc.
I agree and am wondering about the definitions of whitelist and blacklist approaches. From my point of view, using preg_replace() as I suggested is a whitelist approach because I am specifically saying: only allow the characters that I want through. Conversely I would think of ctype_alnum() check as a blacklist approach because it is looking for characters you don't want.

As for "defense in depth," that is a security industry concept (/buzzword ;)) that I guess is being used here to cover what I consider general good programming practices. I would think that a broader discussion would include Apache/PHP server configuration, initial untainting of Request data, data size and bounds checks, using database specific escaping functions, etc. etc. etc. etc.
(#10850)
User avatar
shiflett
Forum Contributor
Posts: 124
Joined: Sun Feb 06, 2005 11:22 am

Post by shiflett »

arborint wrote:I agree and am wondering about the definitions of whitelist and blacklist approaches.
A whitelist approach is one in which you try to identify valid data, assuming everything else to be invalid. (You err on the side of caution.) A blacklist approach is the opposite.
arborint wrote:From my point of view, using preg_replace() as I suggested is a whitelist approach because I am specifically saying: only allow the characters that I want through.
I agree. You have a whitelist of characters. There are other approaches, but I don't think anyone would argue that this isn't a whitelist approach.
arborint wrote:Conversely I would think of ctype_alnum() check as a blacklist approach because it is looking for characters you don't want.
I think you're misunderstanding how this function works. The ctype_alnum() function returns TRUE only when every character in a string is either alphabetic or numeric. This is a whitelist. In fact, you could use preg_replace(), but you would lose the locale awareness.
arborint wrote:As for "defense in depth," that is a security industry concept
It is a security principle, so yes.
arborint wrote:I would think that a broader discussion would include Apache/PHP server configuration, initial untainting of Request data, data size and bounds checks, using database specific escaping functions, etc. etc. etc. etc.
Sure, broader discussions always cover more topics. :-)
Post Reply