Page 1 of 2

Making data secure for display

Posted: Mon Dec 03, 2007 10:53 am
by batfastad
Hi everyone

I'm gradually building a very small intranet application with PHP and MySQL.
There will only be 10 users initially but I'm hoping to eventually replace our company FileMaker database for managing company and contact information.

I've made sure that every variable from external sources is running through the following function:

Code: Select all

function mysql_clean($value) {
	// STRIPSLASHES
	if ( get_magic_quotes_gpc()) {
		$value = stripslashes($value);
	}

	return trim(mysql_real_escape_string($value));
}
So that's SESSION, GET, POST, COOKIE and any PHP internal vars.

That makes sure all data that users could enter is secured, and then PHP can deal with it safely.
But I'm worried about the potential of XSS or malicious HTML code being displayed on the script pages.

Is it enough to run the htmlspecialchars() function on any variables that will be output on the HTML page?

Or should I also run htmlspecialchars() on variables even if they're not going to be in the HTML output?

Thanks
Ben

Posted: Mon Dec 03, 2007 11:23 am
by feyd
There's generally no reason to run htmlspecialchars() on internal data, but it should be run on any data being displayed to the user which was not generated internally or should not be trusted, explicitly.

Posted: Tue Dec 04, 2007 4:57 am
by Mordred
The function is wrong, here's the long story why: http://mordred.niama.net/blog/?p=122

Variables should be escaped right before being passed on (to mysql, user output, whatever), not when they come in. There should no differentiation between "internal" and "external" variables (unless it's somehow a feature, like allowing html in the title, specified in a config file)

Posted: Tue Dec 04, 2007 6:09 am
by jmut

Posted: Tue Dec 04, 2007 6:13 am
by batfastad
I do run all variables through my mysql_clean() function I posted above, at the very start of scripts.
Is it my function that is wrong? What needs to change?


What I'm getting at with my original question is the pulling my data out of the database and outputting it on an HTML page.
I'm developing a small intranet company/contact information app and there are many companies / address fields etc that have & and quote marks " ' in for legitimate reasons

Now this can cause problems with the HTML code, especially quote marks " as they just break HTML tags.

So should I also be running htmlspecialchars() on any variables that I will be displaying on the HTML page?
To ensure the resulting HTML code remains valid and doesn't break say the <.input tag because a variable has a legitimate quote mark in it?

Posted: Tue Dec 04, 2007 6:26 am
by s.dot
Yes. Always escape your output using htmlspecialchars() or htmlentities(). You don't need to escape anything that you're not outputting to the page.

Posted: Tue Dec 04, 2007 7:21 am
by batfastad
Ok great.
So if I just run htmlspecialchars() or htmlentities() on any variables I will be displaying in the HTML page, I should be all set!

One thing I have noticed, is if there are variables that I plan on displaying within a <.textarea> element, I should use htmlentities().
As entities get rendered in <.textarea> elements, but not all of them get rendered in <.input> elements.

Is that correct?

That can cause problems if making a blog/CMS where you type &#xxx; to get an entity, but when you come to edit the record and save over it, the entity gets rendered in the <.textarea> and then gets overwritten in the DB with it's actual character.

Posted: Tue Dec 04, 2007 7:41 am
by s.dot
No, they act the same. However htmlentities() will entity-ize (is that a word?) all characters that have an entity, whereas htmlspecialchars() will only do < > ' " & (i think).

If you use htmlentities() to display data in an input or textarea field, upon submission of that form it will convert the entities such as & back to their character &, which makes it good for entry into your database. Then when you escape it to display again, it will be back in it's entitized form.

Posted: Tue Dec 04, 2007 8:16 am
by Mordred
batfastad wrote: Is it my function that is wrong? What needs to change?
Have you read the article I posted?

batfastad wrote: So should I also be running htmlspecialchars() on any variables that I will be displaying on the HTML page?
To ensure the resulting HTML code remains valid and doesn't break say the <.input tag because a variable has a legitimate quote mark in it?
htmlspecialchars() with ENT_QUOTES and the proper character set. Details:
viewtopic.php?t=75098

Posted: Tue Dec 04, 2007 8:20 am
by batfastad
Yeah that sounds right.

htmlspecialchars() for [s]most[/s] all vars
[s]Then htmlentities() when I'm outputting into a <.textarea> to avoid the entities being rendered to the chars then overwriting the entity code in the DB... pretty hard to explain what I'm getting at there, but I think you get what I mean[/s] :D

Thanks
Ben

EDIT: strikethrough the above bit which is wrong!

Posted: Tue Dec 04, 2007 9:37 am
by batfastad
Mordred wrote:Have you read the article I posted?
Yes, but I don't know how to deal with magic_quotes globally as you said in the article.
Maybe I'm just not seeing the problem fully though :(
My function is modified from the one in the article from the manual... it doesn't do the !is_numeric check, and always runs the variable through mysql_real_escape_string()

I use my mysql_clean() function when I'm pulling any data from POST, GET, REQUEST, SERVER, COOKIE, SESSION
I don't use it on data that I've just extracted from my database, because I'm assuming that to be safe.
Mordred wrote:htmlspecialchars() with ENT_QUOTES and the proper character set. Details:
viewtopic.php?t=75098
Ok perfect. So that's for outputting vars into HTML code - so that quotes and stuff from data in DB doesn't go and break all your HTML tags.
Thanks for the link to that thread - very interesting and a much-needed blast of info :D

So the reason you don't need htmlentities() is because htmlspecialchars() also converts and ampersand, which stops the browser rendering the char within <.textarea> tags
So in the HTML source code the entity for a Euro € actually reads &#8364;... which basically double-escapes the ampersand and stops the entity from being rendered as a char by the browser.
Excellent!!

I think my confusion was I didn't realise that all that needed to be done was to change the ampersand. I thought there was something more that htmlentities did that wasn't covered by htmlspecialchars.
But since all you need is to sort that ampersand... htmlspecialchars will work perfectly!

Thanks
Ben

Posted: Thu Dec 06, 2007 4:49 am
by Mordred
batfastad wrote: Yes, but I don't know how to deal with magic_quotes globally as you said in the article.
Basically, run stripslashes on all input arrays. Search this forum for magic_quotes and stripslashes, I hope you'll find enough examples.
batfastad wrote: I use my mysql_clean() function when I'm pulling any data from POST, GET, REQUEST, SERVER, COOKIE, SESSION
I don't use it on data that I've just extracted from my database, because I'm assuming that to be safe.
It's wrong.

Clean ALL variables BEFORE sending them to mysql.
It is wrong to do so for variables that will not go to mysql query.
It is wrong not to do so for variables will go to the query, even if they were previously pulled from the database.

Posted: Thu Dec 06, 2007 6:49 am
by batfastad
Ah ok :D

@Mordred - Thanks so much for your help so far :wink:... I really want to make sure I'm getting this right in my code :D

I do clean all variables before sending to MySQL... using my mysql_clean() function above.
But in some scripts I think I also run mysql_clean() on variables that aren't going to MySQL. I will change that!

Certainly any vars involved in a MySQL query, are escaped.


Going back to handling stripslashes and magic_quotes globally.
I have a config.php file which is included at the top of every PHP script in the application.
It has the mysql config details and any extra functions I've written that I can re-use frequently.

Do you mean it would be best to separate this into 2 different functions?

1) One function which I run over variables before they go to MySQL... eg:

Code: Select all

// MYSQL CLEAN
function mysql_clean($value) {
    return trim(mysql_real_escape_string($value));
}
2) Then something like this function which runs on every script page from my included config.php file:

Code: Select all

// INPUT MAKE-SAFE
function input_clean(&$value) {
    $value = stripslashes($value);
}

if ( ini_get('magic_quotes_gpc')) {
    array_walk_recursive($_GET, 'input_clean');
    array_walk_recursive($_POST, 'input_clean');
    array_walk_recursive($_COOKIE, 'input_clean');
}
Function from this page... http://www.phpguru.org/article.php/24

So the 2 issues are dealt with separately. THE G/P/C super-global input arrays get passed through that input_clean() function, automatically in my global config.php file on every single page.
Then additionally before I go into MySQL queries, I run any vars through mysql_clean()

Is that a better solution?
Are those functions sufficient?

Thanks
Ben

Posted: Mon Dec 10, 2007 4:13 am
by batfastad
Hi guys

Any opinions on my revised functions for making input data safe?
Separating it into 2 functions - one which runs stripslashes on all the input, automatically on the input arrays on each page load. And another which I use just before passing any data to MySQL.

Thanks
Ben

Posted: Mon Dec 10, 2007 4:44 am
by Mordred
More or less correct, bar a few details.
I'm worried about the trim() in the mysql_clean() function, it shouldn't be there. This is a minor issue really, but may be a sign of a more serious problem. Could it be that you don't put quotes around value tokens in your MySQL queries?