Page 1 of 2
Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 1:27 pm
by alex.barylski
I'm going over my framework and code and trying to figure out what needs improvement, one thing that certainly needs a face lift is my validation and maybe filtering of input data.
So I sit down and reconsider all that I have learned or studied in the past and how I can easily add something new to the framework to simplifiy coding on my next project.
I'm sitting here and thinking, is filtering really nessecary?
- Stripping whitespace
- Converting upper case to lower
- Removing alpha's from phone number, etc
These are some of the examples of filtering I borrowed from Zend. Honestly, the only one which makes sense to me is whitespace and here is why.
My validation routines are very strict, they check format and characters, usually.
If someone enters a phone number with letters in it, the validation will throw an exception and the user notifed of the invalid phone number. It is quite obvious you have letters in a phone number and thus easily corrected.
However, if someone leaves a trailing whitespace behind a email address or phone number and the validation routine fails the user may be stumped by an otherwise perfectly formatted phone number, and just not noticing the trailing whitespace.
I should note, that I did once filter all my incoming data in the controller, so if any of it ended up in the templates before being pulled from the model, it was secure from XSS, etc.
However I have since removed filtering completely (except whitespace) and just echo any data in my templates with htmlspecialchars or HTML_Purifier.
Given my situation or approach to design, can anyone see any issues with my lack of filtering?
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 1:51 pm
by Eran
what about striptags and more complete filtering solutions such as htmlpurifier? you have to protect against XSS
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 2:50 pm
by alex.barylski
I call a filtering function inside my templates.
Code: Select all
<input type="text" value="<?php echo htmlspecialchars($subject); ?>>" />
What am I missing in what you have said?
If I am outputting to a WYSIWYG component then I call either strip tags or HTMLPurifier on the data before echo'ing to screen.
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 3:00 pm
by Eran
what you are missing is that you are filtering for every view of the templates, instead of just once when it enters the database (whats called out filtering instead of in filtering).
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 3:52 pm
by alex.barylski
I think I see what your saying, but I'm not sure you see what I"m saying so I'll try and explain a little better, my bad.
In filtering and out filtering are terms I have never used.
Input filtering (in my book) is the process of taking a data input, such as a phone number:
And removing all non-numeric chars, so this would be the result:
Output filtering, I would actually term as formatting. That is, given a phone number as a integer I would pass that raw data to a formatter which would then output the phone nicely formatted according to locale.
The reason I see input filtering as a redundant, is because my validation routines would catch a phone number improperly formatted and notify the user of the correct format. So in essence, it does the same thing as input filtering, only more explicitly (which is a good thing, especially in dealing with clients) as opposed to implicitly.
For example, if a user typed a phone number in like:
My validation routine would throw an exception and my user would see an error message along the lines:
The phone number entered is improperly formatted (ie: 1-203-234-4354)
Whereas if the input filtering was enabled and validation avoided, you might just end up with a faulty phone number, stored as:
If you use filtering and validation, then filtering becomes redundant, except for dealing with whitespace, which might not be obvious to an end user if they submitted a phone like so:
Assuming the trailing '_' is a whitespace character...the phone number appears properly formatted but fails due to trailing whitespace, so in this case I can see the use behinf some basic filtering.
I don't need to filter results outside of the model really, because the templates take care of removing malicious codes, etc.
Given that my templates filter the incoming data themselves, my models have validation to prevent malicious input data (and the fact I use PDO which prevents SQLi) I don't see a over whelming reason to implement any kind of filtering techniques, except the whitespace.
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 3:56 pm
by Eran
What I said was mostly relevant for user inputted html - you said you use htmlspecialchars in your templates, which it means it runs for every render. It would be more efficient to use it before inserting the data into the database (just once per submission), and then freely echoing in the templates.
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 4:04 pm
by alex.barylski
I'm confused. LOL
I call htmlspecialchars in each template, yes. What you are suggesting is that I call htmlspecialchars before insertion of data into DB and then ignore the additional calls to htmlspecialchars in the template?
What happens if someone modifies the table in phpMyAdmin or a third party tool?
I don't like filtering data that way before storing data, because some of that filtered data might need to exist in storage but is still potentially dangerous in display.
Consider a CMS where you might want to filter <script> from being shown in your WYSIWYG or preview pane, but would still want it executed in the generated web page. If you filter data before insertion to the database, you would loose all <csript> tags or whatever.
For that reason, I prefer to just filter data in the templates on a per required basis.
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 4:09 pm
by Eran
If someone gains access to your database, you have much greater concerns than escaping data in the templates... like somebody stealing all your passwords. So I don't consider that a threat to plan against in PHP.
Regarding what you said afterwards about losing some tags - how do you put it stripped in the WYSIWYG editor, and then regain the tags? If they are independent of each other, why shouldn't they be in different columns in the table?
Also using more advanced filtering like htmlpurifier allows you to define what tags are stripped.
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 4:23 pm
by Christopher
There is certainly a security issue, but there is also a user friendliness issue. All of the following are valid inputs that people might type -- not trying to hack your site:
Code: Select all
1-204-123-4567
12041234567
1 (204) 123-4567
I would even allow things like:
Code: Select all
1 (204) 123-4567
1(204) 123-4567_
Because in the end you want this to store in the database:
There is also the question of how to validate all of the above, whereas the digits only version can be checked with simple rules. So filtering is not just a security measure, it is also a data normalization measure.
But what about:

Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 4:31 pm
by alex.barylski
I essentially used filtering as a normalization process in the past, but it still seems redundant.
I only accept data in one format, not many formats. This format is based on locale.
Where is the security issue? All data is escaped before insertion into a database, filtered or not.
I don't typically support multiple input methods, it's just to much work and to little pay off.
If I validate a phone number as:
1-204-334-1234
or as four separate digits (maybe extension) I see little reason for filtering.
I've checked the format, data is escaped appropriately for each source, filtering seems redundant.
While I agree it's good for normalizing the data, if you only accept one format, than it's not really needed.
Cheers
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 4:35 pm
by allspiritseve
pytrin wrote:What I said was mostly relevant for user inputted html - you said you use htmlspecialchars in your templates, which it means it runs for every render. It would be more efficient to use it before inserting the data into the database (just once per submission), and then freely echoing in the templates.
I'm pretty sure htmlspecialchars is only used for outputting data from the database. You should be escaping data that you put into the database, not turning it into HTML.
Re: Input filtering -- really nessecary
Posted: Wed Jun 25, 2008 4:48 pm
by Eran
Using htmlspecialchars on data just before in enters the database, and using it on unfiltered data in after its retrieved from the database will have the exact same output. However in the former, htmlsepcialchars is executed only once instead of per page view.
This might not be a big deal with htmlspecialchars (which I personally don't use, it was brought up by Hockey's example), but with more complete filtering packages there is a substantial difference in performance.
Re: Input filtering -- really nessecary
Posted: Thu Jun 26, 2008 12:56 am
by matthijs
First, as happens often in discussions like these, it's confusion about the exact terminology that leads to a lot of misunderstanding. Maybe if we first agree on what terms to use, talking about them would be easier.
The main terms are input
filtering, input
validating and output
escaping.
- Validating input is the process of comparing input to a set of rules and asking the question: is the data correct according to those rules? For example, is the username that has been entered alphanumeric? Most often, the answer is a yes or no.
- Filtering input is the process of taking input data and making sure it is passed along in the correct format. Say, take an input and strip the whitespace before and after it before moving the data further on (see Hockeys telephone no example).
- Escaping output is the process of making sure the data can be outputted (safely) to a certain format. Mysql_real_escape_string for output of data to a db, htmlspecialchars for output to HTML (just some possible examples).
Second, your question is difficult to answer, because you're basically asking "is using a hammer really necessary?" A hammer is just a tool, one of many you have in your toolbox. And depending on the situation you can and may want to use it. If you have a web form and have a very strict set of validation rules for an input field in place, you may not need to also filter the data. You give the example of only allowing numbers for the telephone input. In another case, as Arborint mentioned, you may want to allow users to enter the telephone in different formats, with and without brackets, whitespace and slashes. In that case you would first do validation to check if the input is valid (including the possible use of brackets, whitespace and slashes). Then after that you also do some filtering to make sure it's in the format you want to use in your app.
What to use (validation and/or filtering) depends on the situation. If you're coding a simple login form you might only need a few validation rules (username is alphanumeric). But if you're coding a checkout form for a webshop you might want to allow people to fill in data in more ways, as a way to make it easier for them to quickly complete the form. Otherwise, if you have a long form of 20 fields and for each one people have a 50% chance of accidentally filling in the wrong format ( 897-654-666 instead of 897654666), it will take most users ages to correct their "mistakes" and complete your form. And you will loose a lot of potential customers during the process of the checkout in your webshop!
hockey wrote:Given my situation or approach to design, can anyone see any issues with my lack of filtering?
So my answer would be: if your specific situations don't need much filtering, as long as your code is safe by proper input validation and output escaping there's nothing wrong I guess.
Re: Input filtering -- really nessecary
Posted: Thu Jun 26, 2008 3:08 am
by alex.barylski
Good points. arborint also answered with what I was looking for.
Obviously, personally, I feel just using validation is enough, but I was also curious to know of situations where it might not be and the muliple formats certainly was something I over looked, just because I wouldn't offer functionality like that (unless it was demanded of me).
I wasn't suggesting that in all cirmstances filtering is redundant, so much as I was curious to see if there were any potential flaws in just relying on validation, especially given my situation.
There was no right or wrong answer for this question, as with most my questions. I know I was on one side of the fence and when I run out of arguments with myself (I can start a fight in an empty BTW) I turn to the forums to offer another perspective.
Thanks for the feedback.
Cheers
Re: Input filtering -- really nessecary
Posted: Thu Jun 26, 2008 5:29 am
by Christopher
The main reason I filter is simple Defense in Depth. I typically filter out any characters not necessary for a field when accepting the data, then I validate. With most fields things like htmlspecialchars are simply not allowed through. That eliminates many XSS attacks there. I still always validate and escape anyway, but for me it is one more level of defense in case I forget something.