Page 2 of 2
Posted: Tue Oct 23, 2007 10:03 pm
by Christopher
Ambush Commander wrote:This is a point of contention.
More a point of design. If you do your escaping at the very lowest level, using prepared statements for example, then you cannot make a mistake and the only thing you will need to deal with are the cases where you explicitly do not want to escape.
If you never make mistakes and never miss anything then Defense in Depth is a big waste of time.
Final question for our panel of experts: Should you use htmlspecialchars() or htmlentities() ?
Posted: Tue Oct 23, 2007 11:07 pm
by John Cartwright
arborint wrote:Final question for our panel of experts: Should you use htmlspecialchars() or htmlentities() ?
Posted: Fri Oct 26, 2007 12:19 am
by alex.barylski
Ok, well this could very well change my perspective on many levels...
You essentially only need to:
Code: Select all
htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
data which will be rendered in HTML as an attribute data...
If you were say, getting HTML for display in a FORM TEXTAREA then htmlspecialchars is not such a big deal but if it's being output to something where the HTML is rendered, this is when things get tricky and where one might consider using HTML_Purifier.
I don't typically have the attributes change dynamically, except maybe <a href=""> for things like pagers, etc...
As a hack fix, I wonder...would it be sufficient to simply regex the output buffer and htmlspecialchars on each href??? That would at least prevent that "type" of xss wouldn't it?
Posted: Fri Oct 26, 2007 4:34 pm
by Mordred
Okay now, let's be sensible and scientific here. This is complete
bollocks.
@
Jcart (and by extension
ole):
The given link is on some wiki, (which I personally find in no way authoritative) and the last change to it was made in Dec 2006. Moreover, the changes regarding html* functions were made in Dec 2005. Even more over, the text refers some "
rumours" that they may not work, but doesn't provide
any details on what is broken. This is, to say the least, silly. Their code examples regarding use of html*() on another page is
insecure. In short, this resource does not cover even the basic requirements for taking it seriously.
@
Hockey: I understand almost nothing of what you said, but since it doesn't say that you need all three params to the function and do it on all data instead of hacking some check with regexps, then ... err ... it must be wrong
----
As for htmlentities() vs htmlspecialchars() -- it doesn't matter from security point of view which one you'll use, both will handle the html syntax characters. The former will also work when some things in your input string cannot be represented correctly, while the latter will produce garbage. Since that is a design choice, and not a security choice, it is up to the coder to decide if he wants a bug or not. At least he won't have a security hole

Posted: Fri Oct 26, 2007 7:28 pm
by Ambush Commander
Mordred, your point of view is completely correct in a non-UTF-8 context. With UTF-8, things change a bit.
PHP's built-in htmlentities() function is a relic from the bygone eras when UTF-8 was a vague new thing that no one really knew how to do properly. Over the years it has be retrofitted to serve new modes of thinking (for example in PHP 4.1 they added the charset parameter, and in PHP 4.3 they added support for UTF-8), but the basic concept behind the function is for an 8-bit environment. For 8-bit character encodings, you are unable to directly output characters that are not in the character set. That's less than 256 characters (because some of them are control characters). The professed benefit behind htmlentities is that those characters which have named entities defined for them can now be "protected" and be outputted in any context, whether or not the character set is supported or not.
This does nothing for, say Chinese or Japanese glyphs. Usually, what happens is the browser converts them into numeric entities on POST and then people attempt to create htmlentities alternatives that don't double-escape ampersand. STOP! This is wrong!
The true and only solution for the problem of supporting all characters is using UTF-8, which has the ability to express every glyph in almost any human language today. With this capacity, the benefits of htmlentities() disappear: there is no need to armor characters as named entities since UTF-8, by definition, will support them! This is what the PHPWACT wiki is getting at.
Practical solution: If you're not using UTF-8, keep using htmlentities() and deal with broken international input. Or, migrate to UTF-8, and check your strings for UTF-8 wellformedness, then pass them through htmlspecialchars and go your merry way.
I hope I cleared things up for people.