Problems with Variable Width Encodings
Moderator: General Moderators
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Problems with Variable Width Encodings
http://ha.ckers.org/blog/20060817/varia ... -encoding/
I haven't had time to read it in depth, but it sounds pretty scary...
I haven't had time to read it in depth, but it sounds pretty scary...
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Here's an earlier blog post I made about this issue:
http://shiflett.org/archive/178
I think it's a simple, clear example. Hope it helps.
http://shiflett.org/archive/178
I think it's a simple, clear example. Hope it helps.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
htmlentities() is fundamentally flawed, though for a different reason: it doesn't handle control characters, such as a null byte:
My policy is to run things through a specialized escape() function that first calls an encoding checker to make it well-formed and remove non-SGML codepoitns before calling htmlspecialchars (proper char encoding passed, of course).
Code: Select all
<?php echo strlen(htmlentities("\0")); ?>- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
The condensed version (only for UTF-8 and requires iconv to be installed, is thus):
I've also got an implementation that works when iconv is not installed. And no, htmlentities just doesn't work. Period.
Code: Select all
function unichr($code) {
if($code > 1114111 or $code < 0 or
($code >= 55296 and $code <= 57343) ) {
// bits are set outside the "valid" range as defined
// by UNICODE 4.1.0
return '';
}
$x = $y = $z = $w = 0;
if ($code < 128) {
// regular ASCII character
$x = $code;
} else {
// set up bits for UTF-8
$x = ($code & 63) | 128;
if ($code < 2048) {
$y = (($code & 2047) >> 6) | 192;
} else {
$y = (($code & 4032) >> 6) | 128;
if($code < 65536) {
$z = (($code >> 12) & 15) | 224;
} else {
$z = (($code >> 12) & 63) | 128;
$w = (($code >> 18) & 7) | 240;
}
}
}
// set up the actual character
$ret = '';
if($w) $ret .= chr($w);
if($z) $ret .= chr($z);
if($y) $ret .= chr($y);
$ret .= chr($x);
return $ret;
}
function escape($str) {
static $non_sgml_chars = array();
if (empty($non_sgml_chars)) {
for ($i = 0; $i <= 31; $i++) {
// non-SGML ASCII chars
// save \r, \t and \n
if ($i == 9 || $i == 13 || $i == 10) continue;
$non_sgml_chars[chr($i)] = '';
}
for ($i = 127; $i <= 159; $i++) {
$non_sgml_chars[unichr($i)] = '';
}
}
$str = @iconv('UTF-8', 'UTF-8//IGNORE', $str);
$str = strtr($str, $non_sgml_chars);
return htmlspecialchars($str, ENT_COMPAT, 'UTF-8');
}Thanks for showing Ambush. Will study that.
That's quite something ...
That's quite a bold statement, isn't it? Why is every security advise to use it then? I do remember reading (for example a book by Chris) in which htmlentities is used to prevent xss etc. You say each site which uses htmlentities to escape output to html is still vulnerable to xss or other kinds of attacks?Ambush Commander wrote:And no, htmlentities just doesn't work. Period
That's quite something ...
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
It is quite bold. Although I am not saying that suddenly any site that uses htmlentities() is suddenly vulnerable to XSS, there are two ramifications of using a bare-naked htmlentities:That's quite a bold statement, isn't it? Why is every security advise to use it then? I do remember reading (for example a book by Chris) in which htmlentities is used to prevent xss etc. You say each site which uses htmlentities to escape output to html is still vulnerable to xss or other kinds of attacks?
That's quite something ...
1. User input can extremely easily break the validation of pages. No matter how well-constructed the rest of your layout is, null bytes aren't treated very kindly. Even worse is if the string is malformed: the validator may refuse to check your page at all. While browsers are quite forgiving, this is not the case with, say, XML-readers.
2. Under certain conditions (for example, the abovementioned post), especially when user input is put in to the attributes of HTML tags, XSS is enabled.
Since htmlentities() claims to "Convert all applicable characters to HTML entities", with the implicit assumption that anything passed through htmlentities() is safe to output, I would say yes: htmlentities() is fundamentally broken.