Page 1 of 2

validating text input

Posted: Sun Feb 25, 2007 10:47 am
by matthijs
So how do people validate text input (like input from a textarea in a feedback form)?

I see several possibilities, some stricter then others. First one is the native funtion is_string(). But from the documentation it's not entirely clear what goes through and what not.

Another possibility is to create some regex. Like:

Code: Select all

return preg_match('/^[a-zA-Z0-9_]+$/', $request );
or

Code: Select all

return preg_match('/^[-a-z0-9?!()#@\.\'"\s_]*$/i', $request );
But a regex like that can get quite long if you want to add most commonly used characters. And when someone enters his/her name with a special character (agrave f.e.) this regex returns false.

Of course what you would consider valid data depends on what you want to do with it. But lets say the data is mailed (in the body of the mail). Is is_string() a good candidate?

Posted: Sun Feb 25, 2007 11:23 am
by Mordred
Of course what you would consider valid data depends on what you want to do with it.
That about sums it up, yes ;)

is_string() does almost nothing. Well, it checks if the data wasn't an array, which isn't bad really.

I have a question for you though, why is this in the "Security" forum?

Posted: Sun Feb 25, 2007 11:28 am
by matthijs
Well because it has something to do with input validation? If any mods feels it belongs some were else, that's fine with me.

Maybe I should specify my question a bit: let's say the data is used in the body of an email. How would you validate that?

Posted: Sun Feb 25, 2007 11:47 am
by feyd
There's rarely anything to really validate for email body text as far as I know. The only thing I can think of is if they pass mime encoded information. Whether you want to allow such a thing or not is entirely up to you.

Posted: Sun Feb 25, 2007 11:52 am
by Mordred
No, I don't object it being in "security". I just prepare you for the revelation that validation is not (or shouldn't be) a part of your security concerns. Validation is about data integrity, it is a part of the busyness logic of your application.

How would I validate an email? I'd check if it is shorter than 500 characters, I hate reading long mails ;) Otherwise, as far as validation is concerned, anything goes. Is your email client forbidding you to write any particular thing in your emails?

Now, when it comes to actually using a piece of user-supplied data, there are security coniderations to be made. In the case of email bodies the manual advises to use \n for line ending and to limit lines to 70 characters (which is actually a standart compliance issue, not a security issue). In the cases of other email headers you shouldn't let ANY newlines, here's some info about that: http://www.securephpwiki.com/index.php/Email_Injection

Posted: Sun Feb 25, 2007 1:28 pm
by matthijs
Thanks for your replies. I also don't like long emails so limiting the length seems wise :)
Mordred wrote:I just prepare you for the revelation that validation is not (or shouldn't be) a part of your security concerns. Validation is about data integrity, it is a part of the busyness logic of your application.
Funny how often the discussion about the definition of validating comes up. I haven't yet found two persons agreeing on this ...

I am aware of the email injection issue. That's a good link though.
feyd wrote:The only thing I can think of is if they pass mime encoded information. Whether you want to allow such a thing or not is entirely up to you.
Don't know what you mean exactly here. I'll start doing some research there, thanks for pointing it out.

Posted: Fri Mar 02, 2007 10:53 am
by Z3RO21
Data validation is a security topic. Invalid data can be used to exploit systems, thus data validation is a security issue. Just my 2 cents :)

Posted: Fri Mar 02, 2007 11:03 am
by infolock
Mordred wrote:No, I don't object it being in "security". I just prepare you for the revelation that validation is not (or shouldn't be) a part of your security concerns. Validation is about data integrity, it is a part of the busyness logic of your application.

wow...that's absolutely incorrect. data validation should absolutely be part of your security concerns. if not, then prepare yourself to be hacked. You should never, ever, ever trust that the input from the user is going to be 100% legit.. Always validate it, and verify that the data the user is sending you is correct and what you are expecting.

Posted: Fri Mar 02, 2007 11:08 am
by Mordred
Z3RO21 wrote:Data validation is a security topic. Invalid data can be used to exploit systems, thus data validation is a security issue. Just my 2 cents :)
Wrong. Unescaped data can be used to exploit systems. Invalid data can be used to introduce logic errors on the application level, which, while definitely a problem, is not a security one.

Disclaimer: There are cases where what we do with the user-supplied data is neither validation nor escaping though, so maybe it's all a terminology problem, and I am just being a lame grammar cop ;)

Posted: Fri Mar 02, 2007 11:11 am
by Mordred
@infolock and Z3RO21: examples please?

Posted: Fri Mar 02, 2007 1:23 pm
by infolock
mordred: if you want to allow a user to post escape characters, and execute whatever they want, by all means, don't validate your user input. i'm not going to argue with you. there is a reason that addslashes and stripslashes are used in basic entry-level novice security tutorials.. again. it's up to you to accept that. if you don't, visit google, and do a search with "php validate data addslashes" sometime. then, grab some coffee and read for a week or 2. then, finally, come back and we can have an intelligent discussion on this.

Posted: Fri Mar 02, 2007 5:29 pm
by Mordred
@infolock: you seem to be mixing validating and escaping, at least as a matter of terms (I don't know if you understand them as concepts until I see some code of yours, that's why I asked for an example). Otherwise, I've done my homework, thankyouverymuch, and let me tell you, it doesn't include addslashes (you try google for "shiflett addslashes") since a while back.

As for removeslashes, hmm, unles it's for "misconfigured" (as in "enabled") magic_quotes, then it doesn't belong to any "basic entry-level novice security tutorials" one should care to read.

I should maybe start with the examples in order to get the ball rolling. Here's a piece of code that does no validation, just escaping. You tell me how it's insecure:

Code: Select all

$sName = mysql_real_escape_string($_POST['name']);
$sComment = mysql_real_escape_string($_POST['comment']);
mysql_query("INSERT INTO `comments` SET `name`='$sName', `comment`='$sComment'");
Edit: I can point several ways in which this could get broken in application level. None of these affect the security.

Posted: Fri Mar 02, 2007 6:39 pm
by Christopher
Mordred wrote:Wrong. Unescaped data can be used to exploit systems. Invalid data can be used to introduce logic errors on the application level, which, while definitely a problem, is not a security one.
That is for data that is to be put in quotes, such as data inserted into a database. But there are many other security problems. User input that is to be displayed needs to have html entities converted or perhaps certain elements stripped. Other exploits should be XML or JSON or etc., etc., etc. So Unescaped data is not the only thing that can be used to exploit systems.

fixed Z3RO21

Posted: Fri Mar 02, 2007 9:21 pm
by Z3RO21
So what if a novice programmer made a mistake and grabbed a file by url. We all know this is a big security no no, but how many times do you see some users doing this? Well when I first started programming with PHP I made this mistake. What I first did to improve security in my applications was to validate the passed information. I would check to make sure it was a string (is_string()) didn't want it to be an int. Then I checked it for exploitable characters that could lead to directory surfing. Yeah this may be a bad example because it is poor programming (I by no means still practice this) but it does show how validations can be part of a security procedure.

And arborint there is a mistake in your post, I did not post that.

Posted: Sat Mar 03, 2007 3:24 am
by Mordred
Z3RO21, arborint, yes, there are many security concerns which arise for bad design decisions, like dynamic includes. Other examples would be dangerous eval-like functions, register globals-like behaviour (extract, parse_str, variable variables) etc. Also administrative interfaces not protected by authorisation checks, file uploads etc.

My example was with mysql, but I didn't mean escape = mysql_real_escape_string (in fact mysql_real_escape_string may not be enough security for certain SQL cases). Escaping is also cleaning data before outputting it to HTML or XML as not to contain HTML and XML syntax. It is escaping characters when dynamicly creating regexps. It is escaping arguments before calling exec-like functions. Every such function has its own method for protecting syntactic characters from interfering with it (no, not addslashes ;) ), which I generalise as "escaping"

As I said maybe it's a matter of what we call "escaping". The last few weeks I am thinking of how dangerous situations arise in usual programming, and I'm more and more coming to the conclusion that the paradigm we are using when imagining syntactic-level security threats is somewhat flawed. Also the APIs that allow the said syntactic insertions seem flawed. It is a bold claim, I know. I will try to construct an elaborate example these days to scientifically check my theory :)

Another thing I should state is that I am in no way advocating that data should not be validated. On the contrary, I personally am almost anal about validating (and type-converting) all data in my code. In my view though, I don't do it from a security necessity, but for keeping data and logical integrity in my apps. I agree that it adds a layer in security and I vote with two hands and a leg about doing it. (So, Z3RO21, we may actually be in agreement after all)