Page 2 of 2
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 6:51 am
by Technical
VladSun wrote:Why would you need this? PHP is "typeless" language. Elaborate please

And you know, it's very very bad.
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 7:56 am
by matthijs
Technical wrote:Okay, okay, seems that discussion went wrong way.
I'm not asking how to filter/validate, I'm asking what data should I filter/validate. Look at the first post, I wrote a list of options.
My second question was about forced typecasting. I meant, should I use intval(), floatval(), strval() on variables passed as function arguments, database received rows and etc.? Does it hurt performance much?
Well
what data you need to filter/validate depends on
1. What you define by those terms exactly
2. From which context to which context the data goes.
It's not as simple as you make it seem, like here's a list when do I need to "filter" (whatever anyone means with that)
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 8:03 am
by Technical
I assume that:
1) Filtering means removing or escaping of dangerous data, like HTML tags
2) Validating means removing/replacing unfitting data, like letters in numeric parameter
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 8:16 am
by VladSun
Technical wrote:I assume that:
1) Filtering means removing or escaping of dangerous data, like HTML tags
2) Validating means removing/replacing unfitting data, like letters in numeric parameter
We had a discussion here:
viewtopic.php?f=34&t=102752&start=15
I can't agree that validating should change data - it should only return true/false (and error messages) when passed a valid/invalid value.
So, I think you mean:
1) === Escaping
2) === Filtering
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 8:35 am
by josh
Validating +filtering data should be done always when data first enters the system. For example "555-555-5555" may pass validation but the dashes may be filtered out.
Escaping data should be done before using it in a mysql, shell, html, or other "special" kind of output. Different types of mediums call for different levels of filtering. Outputting HTML I usually escape entities, but also implement a word wrapping script so users entering "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" does't cause the page to scroll. If for example I was outputting a javascript string I'd use regex to replace anything but [a-ZA-Z0-9]
Your strongest safe guard is to validate, and escape. Filtering is mainly for data quality (validating also helps data quality).
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 9:41 am
by Technical
josh wrote:Validating +filtering data should be done always when data first enters the system. For example "555-555-5555" may pass validation but the dashes may be filtered out.
Escaping data should be done before using it in a mysql, shell, html, or other "special" kind of output. Different types of mediums call for different levels of filtering. Outputting HTML I usually escape entities, but also implement a word wrapping script so users entering "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" does't cause the page to scroll. If for example I was outputting a javascript string I'd use regex to replace anything but [a-ZA-Z0-9]
Your strongest safe guard is to validate, and escape. Filtering is mainly for data quality (validating also helps data quality).
Thank you and VladSun. This is what I've been asking for.
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 11:39 am
by Technical
By the way, I have a kinda (bad?) habit to typecast function arguments like:
Code: Select all
function Some($Integer, $Array)
{
$Integer = intval($Integer);
$Array = (array) $Array;
}
Does it slow system down much? Or it's fine?
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 2:33 pm
by Weirdan
Technical wrote:Does it slow system down much? Or it's fine?
It may slow down, it may not. Depends on many factors. Profile you specific case.
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 4:00 pm
by Christopher
Technical wrote:Okay, okay, seems that discussion went wrong way.
I'm not asking how to filter/validate, I'm asking what data should I filter/validate. Look at the first post, I wrote a list of options.
I answered this above. I do none of the things on your list. I filter/validate(josh

) input and escape output.
Technical wrote:My second question was about forced typecasting. I meant, should I use intval(), floatval(), strval() on variables passed as function arguments, database received rows and etc.? Does it hurt performance much?
Yes, use intval($var) or (int)$var. That is a valid way to filter. I use that style or preg_replace().
And honestly I don't think I have spent even a millisecond thinking about whether using intval() or preg_replace() to filter input had any effect on performance!!! Filtering and validation is mandatory and those functions are insignificant performance-wise. Where did you get that idea?!?
Re: Filtering/typecasting data: when to do this?
Posted: Wed Jan 12, 2011 11:31 pm
by Technical
Aren't there too many filters and typecasting?
Code: Select all
class BlocksItem extends Item
{
public $Place;
public $Title;
public $HTML;
public $File;
public $Code;
public $Active;
function __construct($Id = null)
{
$this->Exists = FALSE;
if(!empty($Id) && $Result = Database::Query('SELECT * FROM blocks WHERE "id"=\''.intval($Id).'\''))
{
$this->Id = intval($Result[0]['id']);
$this->Exists = TRUE;
$this->Title = Filters::Basic($Result[0]['title']);
$this->Place = Filters::Range($Result[0]['place'], array('sidebar', 'top', 'bottom'));
$this->HTML = (bool) $Result[0]['HTML'];
if($this->HTML)
{
$this->File = Filters::Path($Result[0]['file']);
} else {
$this->Code = stripslashes($Result[0]['code']);
}
$this->Active = (bool) $Result[0]['active'];
}
}
function Save()
{
$Title = Filters::Basic($this->Title);
$HTML = intval((bool) $this->HTML);
$Active = intval((bool) $this->Active);
$Place = 'sidebar';
if(in_array(Filters::Strict($this->Place), array('sidebar', 'top', 'bottom')))
{
$Place = Filters::Strict($this->Place);
}
$File = null;
$Code = null;
if($this->HTML)
{
$File = Filters::Path($this->File);
} else {
$Code = stripslashes($this->Code);
}
if($this->Exists)
{
//UPDATE query
} elseif(//QUERY)) {
//INSERT query
}
}
}
Re: Filtering/typecasting data: when to do this?
Posted: Thu Jan 13, 2011 12:30 am
by matthijs
You shouldn't have to filter and/or validate every line of code in which you use a variable. The general rule is:
- when something comes in, you filter and/or validate
- when something goes out (to a different context), you escape
So in web apps, that normally means:
- when receiving input (GET, POST, COOKIE, etc), you validate and filter that data
- when outputting data you escape. So output to a db is escaped with db specific escape functions like for example mysql_real_escape_string. Output to HTML is escaped with htmlentities(). Etc
Internally, when data is handled inside the same context, inside a specific class for example, you don't have to filter or validate it each time
The thing to watch out for is what exactly is input and output. Sometimes you might think that you use a "safe" "internal" variable, while in reality it's data that can be messed with by outside input. The $_SERVER HTTP variables are a good example. You could think they are "safe" php variables, but they can contain user (outside) input. So using those you also have to validate/filter them when they are used/received by your app.
Re: Filtering/typecasting data: when to do this?
Posted: Thu Jan 13, 2011 3:44 am
by josh
Just to elaborate...
I used to have (int) syndrome. What really made it obvious how needless it was to me is someone worded it this way "its basically like you can't trust yourself". You'll also run into problems when customers want to use something "0283" for a value and expected it to be treated as a string (and not loose the prefixing 0)
The bottom line is validate incoming data (for example requiring a minimum length, require a max length, disallow anything but a-z) that would basically be for *every* field in a production application. Whether it comes over an API, an HTML form, a command line, shouldn't matter. Always validate.
Filtering is optional, and allows you to have lenient validation (no need to mark "555-555-5555" as invalid because it had dashes, just strip [filter] the dashes *before* validation)
Escaping is when changing "medium" (language). For example shell commands get escaped one way, SQL gets escaped another way, HTML gets escaped another way. You want to store the raw value, so you can escape it differently for different mediums. Escaping is for when data *leaves* your system.
Escaping has its purpose in preventing your user from typing an SQL, shell, or HTML command and having it actually run as an SQL, shell or HTML command.
Filtering has it's purpose in loosening the need for validations, and keeping things consistently formatted.
Validation has it's purpose in making sure the user doesn't loose data due to maximum field lengths, also helps to ensure the users enter valid data.
The areas do overlap, for example a really good validation may eliminate the need for escaping, but you still always escape just in case (but only where it is needed, such as when user inputted data is going into an SQL, shell or HTML command).
When I say "user input" don't take it too literally. If the programmer types "&" in a string it should still be escaped to & as part of an HTML document, so always escape when data leaves the system!
There's no harm in validating data that moves from one function to another, but typically that is pointless. However escaping data more than once destroys the original data. '&' should be output as '&' (escaped once) and not '&' (escaped twice)
Some things don't fit in one category, for example does the nl2br() function filter or escape? One could debate that for hours.
Re: Filtering/typecasting data: when to do this?
Posted: Thu Jan 13, 2011 3:58 am
by Technical
I used to think that variable type freedom is good. Now I realize how wrong I was.
Re: Filtering/typecasting data: when to do this?
Posted: Thu Jan 13, 2011 4:10 am
by josh
I don't see what's wrong with dynamic typing, other than the fact you're trying to force it to be strict typing.