addslashes

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

Ambush Commander wrote:This is a point of contention.
More a point of design. If you do your escaping at the very lowest level, using prepared statements for example, then you cannot make a mistake and the only thing you will need to deal with are the cases where you explicitly do not want to escape.

If you never make mistakes and never miss anything then Defense in Depth is a big waste of time. ;)


Final question for our panel of experts: Should you use htmlspecialchars() or htmlentities() ?
(#10850)
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

arborint wrote:Final question for our panel of experts: Should you use htmlspecialchars() or htmlentities() ?
ole wrote:htmlspecialchars()?

Edit: Don't use htmlentities(), use htmlspecialchars() instead. Why? http://www.phpwact.org/php/i18n/utf-8
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Ok, well this could very well change my perspective on many levels...

You essentially only need to:

Code: Select all

htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
data which will be rendered in HTML as an attribute data...

If you were say, getting HTML for display in a FORM TEXTAREA then htmlspecialchars is not such a big deal but if it's being output to something where the HTML is rendered, this is when things get tricky and where one might consider using HTML_Purifier.

I don't typically have the attributes change dynamically, except maybe <a href=""> for things like pagers, etc...

As a hack fix, I wonder...would it be sufficient to simply regex the output buffer and htmlspecialchars on each href??? That would at least prevent that "type" of xss wouldn't it?
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

Okay now, let's be sensible and scientific here. This is complete bollocks.

@Jcart (and by extension ole):

The given link is on some wiki, (which I personally find in no way authoritative) and the last change to it was made in Dec 2006. Moreover, the changes regarding html* functions were made in Dec 2005. Even more over, the text refers some "rumours" that they may not work, but doesn't provide any details on what is broken. This is, to say the least, silly. Their code examples regarding use of html*() on another page is insecure. In short, this resource does not cover even the basic requirements for taking it seriously.

@Hockey: I understand almost nothing of what you said, but since it doesn't say that you need all three params to the function and do it on all data instead of hacking some check with regexps, then ... err ... it must be wrong :)

----

As for htmlentities() vs htmlspecialchars() -- it doesn't matter from security point of view which one you'll use, both will handle the html syntax characters. The former will also work when some things in your input string cannot be represented correctly, while the latter will produce garbage. Since that is a design choice, and not a security choice, it is up to the coder to decide if he wants a bug or not. At least he won't have a security hole ;)
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Mordred, your point of view is completely correct in a non-UTF-8 context. With UTF-8, things change a bit.

PHP's built-in htmlentities() function is a relic from the bygone eras when UTF-8 was a vague new thing that no one really knew how to do properly. Over the years it has be retrofitted to serve new modes of thinking (for example in PHP 4.1 they added the charset parameter, and in PHP 4.3 they added support for UTF-8), but the basic concept behind the function is for an 8-bit environment. For 8-bit character encodings, you are unable to directly output characters that are not in the character set. That's less than 256 characters (because some of them are control characters). The professed benefit behind htmlentities is that those characters which have named entities defined for them can now be "protected" and be outputted in any context, whether or not the character set is supported or not.

This does nothing for, say Chinese or Japanese glyphs. Usually, what happens is the browser converts them into numeric entities on POST and then people attempt to create htmlentities alternatives that don't double-escape ampersand. STOP! This is wrong!

The true and only solution for the problem of supporting all characters is using UTF-8, which has the ability to express every glyph in almost any human language today. With this capacity, the benefits of htmlentities() disappear: there is no need to armor characters as named entities since UTF-8, by definition, will support them! This is what the PHPWACT wiki is getting at.

Practical solution: If you're not using UTF-8, keep using htmlentities() and deal with broken international input. Or, migrate to UTF-8, and check your strings for UTF-8 wellformedness, then pass them through htmlspecialchars and go your merry way.

I hope I cleared things up for people.
Post Reply