Making data secure for display

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

Hi Mordred.

I do enclose all vars in quotes in my MySQL queries.

The actual reason for the trim() is a final check to remove any whitespace characters from the start/end of variables before they get entered into the database.

But so long my function #2 is correct, for escaping the input on every page load, then I'm happy!

Thanks for all your help!
Ben
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

Ok I have one final query on this... relating more to the original question of outputting HTML to the page.

After Mordred's advice and this note at the PHP manual (http://uk3.php.net/manual/en/function.h ... .php#78509), I started doing the following on any variables that will be output to HTML :D

Code: Select all

htmlspecialchars($var, ENT_QUOTES, 'UTF-8')
I changed our MySQL / PHP installation and all our scripts to use UTF-8 now, rather than ISO whatever it was.
So in theory we can handle any strange characters that come our way.
Now when I output data from our database, it gets output correctly even with strange eastern europe chars :D

But I thought that when outputting strange characters for valid HTML, you had to use the entity code? :?:
In that case surely htmlspecialchars() is not enough,

I also thought that it was preferred to use the numeric entity code rather than the text one... meaning the output of htmlentities() is incorrect.

Is that right?
Or have those days gone?

If your variables are to be used in an <input> or <textarea> tag on say an 'edit' page, then htmlspecialchars() is the one you need.

But I thought for valid HTML that you should always use the numeric entity codes. Mind you, I did learn HTML over 10 years ago now, and many things have changed ;)
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

htmlspecialchars/htmlentities does convert the character to it's "numerical entity code". Click View->Page Source to see what htmlentities actually returned.
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

Jcart wrote:htmlspecialchars/htmlentities does convert the character to it's "numerical entity code". Click View->Page Source to see what htmlentities actually returned.
I know that one ;)

But my question was:

1) Am I correct in thinking / remembering that to have valid HTML code, all special chars must be encoded into their entities?
I'm sure that was the case when I learnt HTML... maybe some time ago now though 8O

If that is true... surely for output on a valid HTML page you need to do htmlentities()?
Apart from within an <input> or <textarea> where htmlspecialchars() will work well enough.

This contradicts this note... http://uk3.php.net/manual/en/function.h ... .php#78509 and what was said earlier.

Or is it valid HTML nowadays to just leave special chars un-entity-ised in your HTML code?
eg: copyright symbol, accented chars, eastern europe chars... just leaving them as is, without replacing as entity codes?

Going one more step... I thought W3C recommendations were that numerical entity codes should be used whereever possible.
Not the text codes (even though they work just fine).


2) Is there a PHP function that returns the numeric entity codes for all the entities, rather than text ones?


Thanks
Ben
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

I've never heard that you should always use the entities, that just doesn't make sense. htmlentities will not render as html, like you cannot render a <table> element.

I think what you mean is that the content (i.e. anything that is not supposed to be rendered as html) should always be htmlspecialchar()'d. Then yes.
Post Reply