Filtering/typecasting data: when to do this?

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

User avatar
Technical
Forum Commoner
Posts: 81
Joined: Thu Dec 02, 2010 5:30 am

Re: Filtering/typecasting data: when to do this?

Post by Technical »

VladSun wrote:Why would you need this? PHP is "typeless" language. Elaborate please :)
And you know, it's very very bad.
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Re: Filtering/typecasting data: when to do this?

Post by matthijs »

Technical wrote:Okay, okay, seems that discussion went wrong way.
I'm not asking how to filter/validate, I'm asking what data should I filter/validate. Look at the first post, I wrote a list of options.

My second question was about forced typecasting. I meant, should I use intval(), floatval(), strval() on variables passed as function arguments, database received rows and etc.? Does it hurt performance much?
Well what data you need to filter/validate depends on

1. What you define by those terms exactly
2. From which context to which context the data goes.

It's not as simple as you make it seem, like here's a list when do I need to "filter" (whatever anyone means with that)
User avatar
Technical
Forum Commoner
Posts: 81
Joined: Thu Dec 02, 2010 5:30 am

Re: Filtering/typecasting data: when to do this?

Post by Technical »

I assume that:

1) Filtering means removing or escaping of dangerous data, like HTML tags
2) Validating means removing/replacing unfitting data, like letters in numeric parameter
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Filtering/typecasting data: when to do this?

Post by VladSun »

Technical wrote:I assume that:

1) Filtering means removing or escaping of dangerous data, like HTML tags
2) Validating means removing/replacing unfitting data, like letters in numeric parameter
We had a discussion here:
viewtopic.php?f=34&t=102752&start=15

I can't agree that validating should change data - it should only return true/false (and error messages) when passed a valid/invalid value.

So, I think you mean:

1) === Escaping
2) === Filtering
Last edited by VladSun on Wed Jan 12, 2011 9:34 am, edited 1 time in total.
There are 10 types of people in this world, those who understand binary and those who don't
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Filtering/typecasting data: when to do this?

Post by josh »

Validating +filtering data should be done always when data first enters the system. For example "555-555-5555" may pass validation but the dashes may be filtered out.

Escaping data should be done before using it in a mysql, shell, html, or other "special" kind of output. Different types of mediums call for different levels of filtering. Outputting HTML I usually escape entities, but also implement a word wrapping script so users entering "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" does't cause the page to scroll. If for example I was outputting a javascript string I'd use regex to replace anything but [a-ZA-Z0-9]

Your strongest safe guard is to validate, and escape. Filtering is mainly for data quality (validating also helps data quality).
User avatar
Technical
Forum Commoner
Posts: 81
Joined: Thu Dec 02, 2010 5:30 am

Re: Filtering/typecasting data: when to do this?

Post by Technical »

josh wrote:Validating +filtering data should be done always when data first enters the system. For example "555-555-5555" may pass validation but the dashes may be filtered out.

Escaping data should be done before using it in a mysql, shell, html, or other "special" kind of output. Different types of mediums call for different levels of filtering. Outputting HTML I usually escape entities, but also implement a word wrapping script so users entering "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" does't cause the page to scroll. If for example I was outputting a javascript string I'd use regex to replace anything but [a-ZA-Z0-9]

Your strongest safe guard is to validate, and escape. Filtering is mainly for data quality (validating also helps data quality).
Thank you and VladSun. This is what I've been asking for.
User avatar
Technical
Forum Commoner
Posts: 81
Joined: Thu Dec 02, 2010 5:30 am

Re: Filtering/typecasting data: when to do this?

Post by Technical »

By the way, I have a kinda (bad?) habit to typecast function arguments like:

Code: Select all

function Some($Integer, $Array)
{
    $Integer = intval($Integer);
    $Array = (array) $Array;
}
Does it slow system down much? Or it's fine?
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Filtering/typecasting data: when to do this?

Post by Weirdan »

Technical wrote:Does it slow system down much? Or it's fine?
It may slow down, it may not. Depends on many factors. Profile you specific case.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Filtering/typecasting data: when to do this?

Post by Christopher »

Technical wrote:Okay, okay, seems that discussion went wrong way.
I'm not asking how to filter/validate, I'm asking what data should I filter/validate. Look at the first post, I wrote a list of options.
I answered this above. I do none of the things on your list. I filter/validate(josh ;)) input and escape output.
Technical wrote:My second question was about forced typecasting. I meant, should I use intval(), floatval(), strval() on variables passed as function arguments, database received rows and etc.? Does it hurt performance much?
Yes, use intval($var) or (int)$var. That is a valid way to filter. I use that style or preg_replace().

And honestly I don't think I have spent even a millisecond thinking about whether using intval() or preg_replace() to filter input had any effect on performance!!! Filtering and validation is mandatory and those functions are insignificant performance-wise. Where did you get that idea?!?
(#10850)
User avatar
Technical
Forum Commoner
Posts: 81
Joined: Thu Dec 02, 2010 5:30 am

Re: Filtering/typecasting data: when to do this?

Post by Technical »

Aren't there too many filters and typecasting?

Code: Select all

class BlocksItem extends Item
{
	public $Place;
	public $Title;
	public $HTML;
	public $File;
	public $Code;
	public $Active;
	function __construct($Id = null)
	{
		$this->Exists = FALSE;
		if(!empty($Id) && $Result = Database::Query('SELECT * FROM blocks WHERE "id"=\''.intval($Id).'\''))
		{
			$this->Id = intval($Result[0]['id']);
			$this->Exists = TRUE;
			$this->Title = Filters::Basic($Result[0]['title']);
			$this->Place = Filters::Range($Result[0]['place'], array('sidebar', 'top', 'bottom'));
			$this->HTML = (bool) $Result[0]['HTML'];
			if($this->HTML)
			{
				$this->File = Filters::Path($Result[0]['file']);
			} else {
				$this->Code = stripslashes($Result[0]['code']);
			}
			$this->Active = (bool) $Result[0]['active'];
		}
	}
	function Save()
	{
		$Title = Filters::Basic($this->Title);
		$HTML = intval((bool) $this->HTML);
		$Active = intval((bool) $this->Active);
		$Place = 'sidebar';
		if(in_array(Filters::Strict($this->Place), array('sidebar', 'top', 'bottom')))
		{
			$Place = Filters::Strict($this->Place);
		}
		$File = null;
		$Code = null;
		if($this->HTML)
		{
			$File = Filters::Path($this->File);
		} else {
			$Code = stripslashes($this->Code);
		}
		if($this->Exists)
		{
			//UPDATE query
		} elseif(//QUERY)) {
			//INSERT query
		}
	}
}
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Re: Filtering/typecasting data: when to do this?

Post by matthijs »

You shouldn't have to filter and/or validate every line of code in which you use a variable. The general rule is:
- when something comes in, you filter and/or validate
- when something goes out (to a different context), you escape

So in web apps, that normally means:
- when receiving input (GET, POST, COOKIE, etc), you validate and filter that data
- when outputting data you escape. So output to a db is escaped with db specific escape functions like for example mysql_real_escape_string. Output to HTML is escaped with htmlentities(). Etc

Internally, when data is handled inside the same context, inside a specific class for example, you don't have to filter or validate it each time

The thing to watch out for is what exactly is input and output. Sometimes you might think that you use a "safe" "internal" variable, while in reality it's data that can be messed with by outside input. The $_SERVER HTTP variables are a good example. You could think they are "safe" php variables, but they can contain user (outside) input. So using those you also have to validate/filter them when they are used/received by your app.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Filtering/typecasting data: when to do this?

Post by josh »

Just to elaborate...

I used to have (int) syndrome. What really made it obvious how needless it was to me is someone worded it this way "its basically like you can't trust yourself". You'll also run into problems when customers want to use something "0283" for a value and expected it to be treated as a string (and not loose the prefixing 0)

The bottom line is validate incoming data (for example requiring a minimum length, require a max length, disallow anything but a-z) that would basically be for *every* field in a production application. Whether it comes over an API, an HTML form, a command line, shouldn't matter. Always validate.

Filtering is optional, and allows you to have lenient validation (no need to mark "555-555-5555" as invalid because it had dashes, just strip [filter] the dashes *before* validation)

Escaping is when changing "medium" (language). For example shell commands get escaped one way, SQL gets escaped another way, HTML gets escaped another way. You want to store the raw value, so you can escape it differently for different mediums. Escaping is for when data *leaves* your system.

Escaping has its purpose in preventing your user from typing an SQL, shell, or HTML command and having it actually run as an SQL, shell or HTML command.
Filtering has it's purpose in loosening the need for validations, and keeping things consistently formatted.
Validation has it's purpose in making sure the user doesn't loose data due to maximum field lengths, also helps to ensure the users enter valid data.

The areas do overlap, for example a really good validation may eliminate the need for escaping, but you still always escape just in case (but only where it is needed, such as when user inputted data is going into an SQL, shell or HTML command).

When I say "user input" don't take it too literally. If the programmer types "&" in a string it should still be escaped to & as part of an HTML document, so always escape when data leaves the system!

There's no harm in validating data that moves from one function to another, but typically that is pointless. However escaping data more than once destroys the original data. '&' should be output as '&' (escaped once) and not '&' (escaped twice)

Some things don't fit in one category, for example does the nl2br() function filter or escape? One could debate that for hours.
User avatar
Technical
Forum Commoner
Posts: 81
Joined: Thu Dec 02, 2010 5:30 am

Re: Filtering/typecasting data: when to do this?

Post by Technical »

I used to think that variable type freedom is good. Now I realize how wrong I was.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Filtering/typecasting data: when to do this?

Post by josh »

I don't see what's wrong with dynamic typing, other than the fact you're trying to force it to be strict typing.
Post Reply