Page 1 of 3

Posted: Sun Aug 13, 2006 11:06 pm
by feyd
What do you mean by "test for security"?

Posted: Sun Aug 13, 2006 11:09 pm
by Ambush Commander
In terms of XSS. In the live demo, there really is barely any security threat because the HTML you post is not shown to anyone else. But the code that powers the demo can be used for other things that do involve other viewers, so any Javascript (or other baddies) that get through are security problems.

Posted: Mon Aug 14, 2006 12:38 am
by Weirdan
does not output (or accept?) russian characters properly:
Image

Posted: Mon Aug 14, 2006 12:43 am
by Weirdan
I think it should either 'textify' as is or eradicate altogether not allowed tags. At the moment it looks like it first purifies it and then textifies. Take for example <iframe>

Posted: Mon Aug 14, 2006 12:47 am
by Weirdan
chokes on incomplete attributes:
<img src="

Posted: Mon Aug 14, 2006 7:57 am
by Ambush Commander
does not output (or accept?) russian characters properly:
That's extremely strange! Because I tested it with Chinese characters a while back and it worked properly (that's not working now either). But you're correct. I'll have to see where the encoding goes wrong.

Edit - After further investigation, UTF-8 works in PHP 4 but not in PHP 5. Hrmm...

Edit 2 - I know why: the DOM extension is not UTF-8 safe without more configuration! This'll be easy to fix.
I think it should either 'textify' as is or eradicate altogether not allowed tags. At the moment it looks like it first purifies it and then textifies. Take for example <iframe>
Yep. This is due to the design of HTMLPurifier where parsing the HTML happens first and could cause information to be lost. Eradication would be the simplest way, smart textification would require more coding.
chokes on incomplete attributes:
That's interesting, but I know precisely why it's happening. I don't think I'm going to bother fixing it (besides a trivial check or two). This is because when running PHP 5, the extension uses DOM to parse the text, which means wrapping the text in <html> and <body>. So then your code looks like: "<html><body><div><img src="</div></body></html>" Maybe I can do without those.

Posted: Mon Aug 14, 2006 8:27 am
by Ambush Commander
International characters now work in PHP 5.

Posted: Mon Aug 14, 2006 9:57 am
by feyd
Volka's posted html code here generates some interesting output

Code: Select all

xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">  <title>xyz</title><form method="post" action="whatever1">
<input type="text" name="username" /><input type="text" name="password" /><input type="submit" />
</form> <form method="post" action="whatever2">
<input type="text" name="username" /><input type="text" name="password" /><input type="submit" />
</form>
My own basic purifier (that was only concerned with removing attributes and tags that weren't wanted) might be of interest too. This was built some time ago though, without any updates since.

http://code.tatzu.net/cleantags/

With default settings, cleantags outputs

Code: Select all



   
      xyz
   
   
      
         <div>
            <input type="text" name="username">
            <input type="text" name="password">
            <input type="submit">
         </div>
      
      
         <div>
            <input type="text" name="username">
            <input type="text" name="password">
            <input type="submit">
         </div>
      
   

Posted: Mon Aug 14, 2006 12:11 pm
by Ambush Commander
Volka's posted html code here generates some interesting output
This is a symptom of a core problem/missing feature, namely, the ability to recognize that input is a well-formed document rather than a fragment and then parse it accordingly. I'll put this on high priority.
My own basic purifier (that was only concerned with removing attributes and tags that weren't wanted) might be of interest too.
While I'd rather not criticize Feyd, this is precisely the type of fundamentally flawed filter I wanted to replace with this library. Blacklist just simply does not work (in terms of protecting against XSS). However, the behavior seems quite intuitive, so I'll try to model default behavior after that.

Posted: Mon Aug 14, 2006 12:17 pm
by feyd
I didn't write it to protect against XSS. Just remove unwanted tags, attributes and attribute values. So no worries about hurting my feelings.

It does filter out several XSS attempts however. :)

Posted: Mon Aug 14, 2006 12:40 pm
by Ambush Commander
It does filter out several XSS attempts however.
Yep. The most common ones are caughtt.

Hmm... in terms of removing versus escaping bad tags, which should be default?

Posted: Mon Aug 14, 2006 12:42 pm
by feyd
for most things, I think removing them altogether is preferred over escaping. As long as it's easily switched, it's all good. :)

Posted: Mon Aug 14, 2006 12:53 pm
by s.dot
feyd wrote:for most things, I think removing them altogether is preferred over escaping. As long as it's easily switched, it's all good. :)
Agreed.

Posted: Mon Aug 14, 2006 6:35 pm
by Ollie Saunders
OK excuse my ignorance but what is the intended audience, purpose and expected application of HTMLPurifier?
It does seem amazing, and its astonishing what you have achieved in such a short time, but why do I need HTMLPurifier?

Edit: Oh it would be nice if it indented the HTML properly for you and removed all other whitespace.

Posted: Mon Aug 14, 2006 6:54 pm
by Ambush Commander
OK excuse my ignorance but what is the intended audience, purpose and expected application of HTMLPurifier?
It does seem amazing, and its astonishing what you have achieved in such a short time, but why do I need HTMLPurifier?
In order to insure user-submitted HTML is safe for output, both in terms of XSS and Validation. Heck, you could even send trusted content through it just to make sure the page validates.
Oh it would be nice if it indented the HTML properly for you and removed all other whitespace.
That's a feature that I've been thinking about. I'll probably have it done before the stable release.