Posted: Sun Aug 13, 2006 11:06 pm
What do you mean by "test for security"?
A community of PHP developers offering assistance, advice, discussion, and friendship.
http://forums.devnetwork.net/
<img src="
That's extremely strange! Because I tested it with Chinese characters a while back and it worked properly (that's not working now either). But you're correct. I'll have to see where the encoding goes wrong.does not output (or accept?) russian characters properly:
Yep. This is due to the design of HTMLPurifier where parsing the HTML happens first and could cause information to be lost. Eradication would be the simplest way, smart textification would require more coding.I think it should either 'textify' as is or eradicate altogether not allowed tags. At the moment it looks like it first purifies it and then textifies. Take for example <iframe>
That's interesting, but I know precisely why it's happening. I don't think I'm going to bother fixing it (besides a trivial check or two). This is because when running PHP 5, the extension uses DOM to parse the text, which means wrapping the text in <html> and <body>. So then your code looks like: "<html><body><div><img src="</div></body></html>" Maybe I can do without those.chokes on incomplete attributes:
Code: Select all
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <title>xyz</title><form method="post" action="whatever1">
<input type="text" name="username" /><input type="text" name="password" /><input type="submit" />
</form> <form method="post" action="whatever2">
<input type="text" name="username" /><input type="text" name="password" /><input type="submit" />
</form>Code: Select all
xyz
<div>
<input type="text" name="username">
<input type="text" name="password">
<input type="submit">
</div>
<div>
<input type="text" name="username">
<input type="text" name="password">
<input type="submit">
</div>
This is a symptom of a core problem/missing feature, namely, the ability to recognize that input is a well-formed document rather than a fragment and then parse it accordingly. I'll put this on high priority.Volka's posted html code here generates some interesting output
While I'd rather not criticize Feyd, this is precisely the type of fundamentally flawed filter I wanted to replace with this library. Blacklist just simply does not work (in terms of protecting against XSS). However, the behavior seems quite intuitive, so I'll try to model default behavior after that.My own basic purifier (that was only concerned with removing attributes and tags that weren't wanted) might be of interest too.
Yep. The most common ones are caughtt.It does filter out several XSS attempts however.
Agreed.feyd wrote:for most things, I think removing them altogether is preferred over escaping. As long as it's easily switched, it's all good.
In order to insure user-submitted HTML is safe for output, both in terms of XSS and Validation. Heck, you could even send trusted content through it just to make sure the page validates.OK excuse my ignorance but what is the intended audience, purpose and expected application of HTMLPurifier?
It does seem amazing, and its astonishing what you have achieved in such a short time, but why do I need HTMLPurifier?
That's a feature that I've been thinking about. I'll probably have it done before the stable release.Oh it would be nice if it indented the HTML properly for you and removed all other whitespace.