And you know, it's very very bad.VladSun wrote:Why would you need this? PHP is "typeless" language. Elaborate please
Filtering/typecasting data: when to do this?
Moderator: General Moderators
Re: Filtering/typecasting data: when to do this?
Re: Filtering/typecasting data: when to do this?
Well what data you need to filter/validate depends onTechnical wrote:Okay, okay, seems that discussion went wrong way.
I'm not asking how to filter/validate, I'm asking what data should I filter/validate. Look at the first post, I wrote a list of options.
My second question was about forced typecasting. I meant, should I use intval(), floatval(), strval() on variables passed as function arguments, database received rows and etc.? Does it hurt performance much?
1. What you define by those terms exactly
2. From which context to which context the data goes.
It's not as simple as you make it seem, like here's a list when do I need to "filter" (whatever anyone means with that)
Re: Filtering/typecasting data: when to do this?
I assume that:
1) Filtering means removing or escaping of dangerous data, like HTML tags
2) Validating means removing/replacing unfitting data, like letters in numeric parameter
1) Filtering means removing or escaping of dangerous data, like HTML tags
2) Validating means removing/replacing unfitting data, like letters in numeric parameter
Re: Filtering/typecasting data: when to do this?
We had a discussion here:Technical wrote:I assume that:
1) Filtering means removing or escaping of dangerous data, like HTML tags
2) Validating means removing/replacing unfitting data, like letters in numeric parameter
viewtopic.php?f=34&t=102752&start=15
I can't agree that validating should change data - it should only return true/false (and error messages) when passed a valid/invalid value.
So, I think you mean:
1) === Escaping
2) === Filtering
Last edited by VladSun on Wed Jan 12, 2011 9:34 am, edited 1 time in total.
There are 10 types of people in this world, those who understand binary and those who don't
Re: Filtering/typecasting data: when to do this?
Validating +filtering data should be done always when data first enters the system. For example "555-555-5555" may pass validation but the dashes may be filtered out.
Escaping data should be done before using it in a mysql, shell, html, or other "special" kind of output. Different types of mediums call for different levels of filtering. Outputting HTML I usually escape entities, but also implement a word wrapping script so users entering "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" does't cause the page to scroll. If for example I was outputting a javascript string I'd use regex to replace anything but [a-ZA-Z0-9]
Your strongest safe guard is to validate, and escape. Filtering is mainly for data quality (validating also helps data quality).
Escaping data should be done before using it in a mysql, shell, html, or other "special" kind of output. Different types of mediums call for different levels of filtering. Outputting HTML I usually escape entities, but also implement a word wrapping script so users entering "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" does't cause the page to scroll. If for example I was outputting a javascript string I'd use regex to replace anything but [a-ZA-Z0-9]
Your strongest safe guard is to validate, and escape. Filtering is mainly for data quality (validating also helps data quality).
Re: Filtering/typecasting data: when to do this?
Thank you and VladSun. This is what I've been asking for.josh wrote:Validating +filtering data should be done always when data first enters the system. For example "555-555-5555" may pass validation but the dashes may be filtered out.
Escaping data should be done before using it in a mysql, shell, html, or other "special" kind of output. Different types of mediums call for different levels of filtering. Outputting HTML I usually escape entities, but also implement a word wrapping script so users entering "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" does't cause the page to scroll. If for example I was outputting a javascript string I'd use regex to replace anything but [a-ZA-Z0-9]
Your strongest safe guard is to validate, and escape. Filtering is mainly for data quality (validating also helps data quality).
Re: Filtering/typecasting data: when to do this?
By the way, I have a kinda (bad?) habit to typecast function arguments like:
Does it slow system down much? Or it's fine?
Code: Select all
function Some($Integer, $Array)
{
$Integer = intval($Integer);
$Array = (array) $Array;
}
Re: Filtering/typecasting data: when to do this?
It may slow down, it may not. Depends on many factors. Profile you specific case.Technical wrote:Does it slow system down much? Or it's fine?
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Filtering/typecasting data: when to do this?
I answered this above. I do none of the things on your list. I filter/validate(joshTechnical wrote:Okay, okay, seems that discussion went wrong way.
I'm not asking how to filter/validate, I'm asking what data should I filter/validate. Look at the first post, I wrote a list of options.
Yes, use intval($var) or (int)$var. That is a valid way to filter. I use that style or preg_replace().Technical wrote:My second question was about forced typecasting. I meant, should I use intval(), floatval(), strval() on variables passed as function arguments, database received rows and etc.? Does it hurt performance much?
And honestly I don't think I have spent even a millisecond thinking about whether using intval() or preg_replace() to filter input had any effect on performance!!! Filtering and validation is mandatory and those functions are insignificant performance-wise. Where did you get that idea?!?
(#10850)
Re: Filtering/typecasting data: when to do this?
Aren't there too many filters and typecasting?
Code: Select all
class BlocksItem extends Item
{
public $Place;
public $Title;
public $HTML;
public $File;
public $Code;
public $Active;
function __construct($Id = null)
{
$this->Exists = FALSE;
if(!empty($Id) && $Result = Database::Query('SELECT * FROM blocks WHERE "id"=\''.intval($Id).'\''))
{
$this->Id = intval($Result[0]['id']);
$this->Exists = TRUE;
$this->Title = Filters::Basic($Result[0]['title']);
$this->Place = Filters::Range($Result[0]['place'], array('sidebar', 'top', 'bottom'));
$this->HTML = (bool) $Result[0]['HTML'];
if($this->HTML)
{
$this->File = Filters::Path($Result[0]['file']);
} else {
$this->Code = stripslashes($Result[0]['code']);
}
$this->Active = (bool) $Result[0]['active'];
}
}
function Save()
{
$Title = Filters::Basic($this->Title);
$HTML = intval((bool) $this->HTML);
$Active = intval((bool) $this->Active);
$Place = 'sidebar';
if(in_array(Filters::Strict($this->Place), array('sidebar', 'top', 'bottom')))
{
$Place = Filters::Strict($this->Place);
}
$File = null;
$Code = null;
if($this->HTML)
{
$File = Filters::Path($this->File);
} else {
$Code = stripslashes($this->Code);
}
if($this->Exists)
{
//UPDATE query
} elseif(//QUERY)) {
//INSERT query
}
}
}
Re: Filtering/typecasting data: when to do this?
You shouldn't have to filter and/or validate every line of code in which you use a variable. The general rule is:
- when something comes in, you filter and/or validate
- when something goes out (to a different context), you escape
So in web apps, that normally means:
- when receiving input (GET, POST, COOKIE, etc), you validate and filter that data
- when outputting data you escape. So output to a db is escaped with db specific escape functions like for example mysql_real_escape_string. Output to HTML is escaped with htmlentities(). Etc
Internally, when data is handled inside the same context, inside a specific class for example, you don't have to filter or validate it each time
The thing to watch out for is what exactly is input and output. Sometimes you might think that you use a "safe" "internal" variable, while in reality it's data that can be messed with by outside input. The $_SERVER HTTP variables are a good example. You could think they are "safe" php variables, but they can contain user (outside) input. So using those you also have to validate/filter them when they are used/received by your app.
- when something comes in, you filter and/or validate
- when something goes out (to a different context), you escape
So in web apps, that normally means:
- when receiving input (GET, POST, COOKIE, etc), you validate and filter that data
- when outputting data you escape. So output to a db is escaped with db specific escape functions like for example mysql_real_escape_string. Output to HTML is escaped with htmlentities(). Etc
Internally, when data is handled inside the same context, inside a specific class for example, you don't have to filter or validate it each time
The thing to watch out for is what exactly is input and output. Sometimes you might think that you use a "safe" "internal" variable, while in reality it's data that can be messed with by outside input. The $_SERVER HTTP variables are a good example. You could think they are "safe" php variables, but they can contain user (outside) input. So using those you also have to validate/filter them when they are used/received by your app.
Re: Filtering/typecasting data: when to do this?
Just to elaborate...
I used to have (int) syndrome. What really made it obvious how needless it was to me is someone worded it this way "its basically like you can't trust yourself". You'll also run into problems when customers want to use something "0283" for a value and expected it to be treated as a string (and not loose the prefixing 0)
The bottom line is validate incoming data (for example requiring a minimum length, require a max length, disallow anything but a-z) that would basically be for *every* field in a production application. Whether it comes over an API, an HTML form, a command line, shouldn't matter. Always validate.
Filtering is optional, and allows you to have lenient validation (no need to mark "555-555-5555" as invalid because it had dashes, just strip [filter] the dashes *before* validation)
Escaping is when changing "medium" (language). For example shell commands get escaped one way, SQL gets escaped another way, HTML gets escaped another way. You want to store the raw value, so you can escape it differently for different mediums. Escaping is for when data *leaves* your system.
Escaping has its purpose in preventing your user from typing an SQL, shell, or HTML command and having it actually run as an SQL, shell or HTML command.
Filtering has it's purpose in loosening the need for validations, and keeping things consistently formatted.
Validation has it's purpose in making sure the user doesn't loose data due to maximum field lengths, also helps to ensure the users enter valid data.
The areas do overlap, for example a really good validation may eliminate the need for escaping, but you still always escape just in case (but only where it is needed, such as when user inputted data is going into an SQL, shell or HTML command).
When I say "user input" don't take it too literally. If the programmer types "&" in a string it should still be escaped to & as part of an HTML document, so always escape when data leaves the system!
There's no harm in validating data that moves from one function to another, but typically that is pointless. However escaping data more than once destroys the original data. '&' should be output as '&' (escaped once) and not '&' (escaped twice)
Some things don't fit in one category, for example does the nl2br() function filter or escape? One could debate that for hours.
I used to have (int) syndrome. What really made it obvious how needless it was to me is someone worded it this way "its basically like you can't trust yourself". You'll also run into problems when customers want to use something "0283" for a value and expected it to be treated as a string (and not loose the prefixing 0)
The bottom line is validate incoming data (for example requiring a minimum length, require a max length, disallow anything but a-z) that would basically be for *every* field in a production application. Whether it comes over an API, an HTML form, a command line, shouldn't matter. Always validate.
Filtering is optional, and allows you to have lenient validation (no need to mark "555-555-5555" as invalid because it had dashes, just strip [filter] the dashes *before* validation)
Escaping is when changing "medium" (language). For example shell commands get escaped one way, SQL gets escaped another way, HTML gets escaped another way. You want to store the raw value, so you can escape it differently for different mediums. Escaping is for when data *leaves* your system.
Escaping has its purpose in preventing your user from typing an SQL, shell, or HTML command and having it actually run as an SQL, shell or HTML command.
Filtering has it's purpose in loosening the need for validations, and keeping things consistently formatted.
Validation has it's purpose in making sure the user doesn't loose data due to maximum field lengths, also helps to ensure the users enter valid data.
The areas do overlap, for example a really good validation may eliminate the need for escaping, but you still always escape just in case (but only where it is needed, such as when user inputted data is going into an SQL, shell or HTML command).
When I say "user input" don't take it too literally. If the programmer types "&" in a string it should still be escaped to & as part of an HTML document, so always escape when data leaves the system!
There's no harm in validating data that moves from one function to another, but typically that is pointless. However escaping data more than once destroys the original data. '&' should be output as '&' (escaped once) and not '&' (escaped twice)
Some things don't fit in one category, for example does the nl2br() function filter or escape? One could debate that for hours.
Re: Filtering/typecasting data: when to do this?
I used to think that variable type freedom is good. Now I realize how wrong I was.
Re: Filtering/typecasting data: when to do this?
I don't see what's wrong with dynamic typing, other than the fact you're trying to force it to be strict typing.