validating text input

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

validating text input

Post by matthijs »

So how do people validate text input (like input from a textarea in a feedback form)?

I see several possibilities, some stricter then others. First one is the native funtion is_string(). But from the documentation it's not entirely clear what goes through and what not.

Another possibility is to create some regex. Like:

Code: Select all

return preg_match('/^[a-zA-Z0-9_]+$/', $request );
or

Code: Select all

return preg_match('/^[-a-z0-9?!()#@\.\'"\s_]*$/i', $request );
But a regex like that can get quite long if you want to add most commonly used characters. And when someone enters his/her name with a special character (agrave f.e.) this regex returns false.

Of course what you would consider valid data depends on what you want to do with it. But lets say the data is mailed (in the body of the mail). Is is_string() a good candidate?
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

Of course what you would consider valid data depends on what you want to do with it.
That about sums it up, yes ;)

is_string() does almost nothing. Well, it checks if the data wasn't an array, which isn't bad really.

I have a question for you though, why is this in the "Security" forum?
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Well because it has something to do with input validation? If any mods feels it belongs some were else, that's fine with me.

Maybe I should specify my question a bit: let's say the data is used in the body of an email. How would you validate that?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

There's rarely anything to really validate for email body text as far as I know. The only thing I can think of is if they pass mime encoded information. Whether you want to allow such a thing or not is entirely up to you.
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

No, I don't object it being in "security". I just prepare you for the revelation that validation is not (or shouldn't be) a part of your security concerns. Validation is about data integrity, it is a part of the busyness logic of your application.

How would I validate an email? I'd check if it is shorter than 500 characters, I hate reading long mails ;) Otherwise, as far as validation is concerned, anything goes. Is your email client forbidding you to write any particular thing in your emails?

Now, when it comes to actually using a piece of user-supplied data, there are security coniderations to be made. In the case of email bodies the manual advises to use \n for line ending and to limit lines to 70 characters (which is actually a standart compliance issue, not a security issue). In the cases of other email headers you shouldn't let ANY newlines, here's some info about that: http://www.securephpwiki.com/index.php/Email_Injection
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Thanks for your replies. I also don't like long emails so limiting the length seems wise :)
Mordred wrote:I just prepare you for the revelation that validation is not (or shouldn't be) a part of your security concerns. Validation is about data integrity, it is a part of the busyness logic of your application.
Funny how often the discussion about the definition of validating comes up. I haven't yet found two persons agreeing on this ...

I am aware of the email injection issue. That's a good link though.
feyd wrote:The only thing I can think of is if they pass mime encoded information. Whether you want to allow such a thing or not is entirely up to you.
Don't know what you mean exactly here. I'll start doing some research there, thanks for pointing it out.
Z3RO21
Forum Contributor
Posts: 130
Joined: Thu Aug 17, 2006 8:59 am

Post by Z3RO21 »

Data validation is a security topic. Invalid data can be used to exploit systems, thus data validation is a security issue. Just my 2 cents :)
User avatar
infolock
DevNet Resident
Posts: 1708
Joined: Wed Sep 25, 2002 7:47 pm

Post by infolock »

Mordred wrote:No, I don't object it being in "security". I just prepare you for the revelation that validation is not (or shouldn't be) a part of your security concerns. Validation is about data integrity, it is a part of the busyness logic of your application.

wow...that's absolutely incorrect. data validation should absolutely be part of your security concerns. if not, then prepare yourself to be hacked. You should never, ever, ever trust that the input from the user is going to be 100% legit.. Always validate it, and verify that the data the user is sending you is correct and what you are expecting.
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

Z3RO21 wrote:Data validation is a security topic. Invalid data can be used to exploit systems, thus data validation is a security issue. Just my 2 cents :)
Wrong. Unescaped data can be used to exploit systems. Invalid data can be used to introduce logic errors on the application level, which, while definitely a problem, is not a security one.

Disclaimer: There are cases where what we do with the user-supplied data is neither validation nor escaping though, so maybe it's all a terminology problem, and I am just being a lame grammar cop ;)
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

@infolock and Z3RO21: examples please?
User avatar
infolock
DevNet Resident
Posts: 1708
Joined: Wed Sep 25, 2002 7:47 pm

Post by infolock »

mordred: if you want to allow a user to post escape characters, and execute whatever they want, by all means, don't validate your user input. i'm not going to argue with you. there is a reason that addslashes and stripslashes are used in basic entry-level novice security tutorials.. again. it's up to you to accept that. if you don't, visit google, and do a search with "php validate data addslashes" sometime. then, grab some coffee and read for a week or 2. then, finally, come back and we can have an intelligent discussion on this.
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

@infolock: you seem to be mixing validating and escaping, at least as a matter of terms (I don't know if you understand them as concepts until I see some code of yours, that's why I asked for an example). Otherwise, I've done my homework, thankyouverymuch, and let me tell you, it doesn't include addslashes (you try google for "shiflett addslashes") since a while back.

As for removeslashes, hmm, unles it's for "misconfigured" (as in "enabled") magic_quotes, then it doesn't belong to any "basic entry-level novice security tutorials" one should care to read.

I should maybe start with the examples in order to get the ball rolling. Here's a piece of code that does no validation, just escaping. You tell me how it's insecure:

Code: Select all

$sName = mysql_real_escape_string($_POST['name']);
$sComment = mysql_real_escape_string($_POST['comment']);
mysql_query("INSERT INTO `comments` SET `name`='$sName', `comment`='$sComment'");
Edit: I can point several ways in which this could get broken in application level. None of these affect the security.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

Mordred wrote:Wrong. Unescaped data can be used to exploit systems. Invalid data can be used to introduce logic errors on the application level, which, while definitely a problem, is not a security one.
That is for data that is to be put in quotes, such as data inserted into a database. But there are many other security problems. User input that is to be displayed needs to have html entities converted or perhaps certain elements stripped. Other exploits should be XML or JSON or etc., etc., etc. So Unescaped data is not the only thing that can be used to exploit systems.

fixed Z3RO21
Last edited by Christopher on Fri Mar 02, 2007 10:07 pm, edited 1 time in total.
(#10850)
Z3RO21
Forum Contributor
Posts: 130
Joined: Thu Aug 17, 2006 8:59 am

Post by Z3RO21 »

So what if a novice programmer made a mistake and grabbed a file by url. We all know this is a big security no no, but how many times do you see some users doing this? Well when I first started programming with PHP I made this mistake. What I first did to improve security in my applications was to validate the passed information. I would check to make sure it was a string (is_string()) didn't want it to be an int. Then I checked it for exploitable characters that could lead to directory surfing. Yeah this may be a bad example because it is poor programming (I by no means still practice this) but it does show how validations can be part of a security procedure.

And arborint there is a mistake in your post, I did not post that.
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

Z3RO21, arborint, yes, there are many security concerns which arise for bad design decisions, like dynamic includes. Other examples would be dangerous eval-like functions, register globals-like behaviour (extract, parse_str, variable variables) etc. Also administrative interfaces not protected by authorisation checks, file uploads etc.

My example was with mysql, but I didn't mean escape = mysql_real_escape_string (in fact mysql_real_escape_string may not be enough security for certain SQL cases). Escaping is also cleaning data before outputting it to HTML or XML as not to contain HTML and XML syntax. It is escaping characters when dynamicly creating regexps. It is escaping arguments before calling exec-like functions. Every such function has its own method for protecting syntactic characters from interfering with it (no, not addslashes ;) ), which I generalise as "escaping"

As I said maybe it's a matter of what we call "escaping". The last few weeks I am thinking of how dangerous situations arise in usual programming, and I'm more and more coming to the conclusion that the paradigm we are using when imagining syntactic-level security threats is somewhat flawed. Also the APIs that allow the said syntactic insertions seem flawed. It is a bold claim, I know. I will try to construct an elaborate example these days to scientifically check my theory :)

Another thing I should state is that I am in no way advocating that data should not be validated. On the contrary, I personally am almost anal about validating (and type-converting) all data in my code. In my view though, I don't do it from a security necessity, but for keeping data and logical integrity in my apps. I agree that it adds a layer in security and I vote with two hands and a leg about doing it. (So, Z3RO21, we may actually be in agreement after all)
Post Reply