HTMLPurifier PHP library homepage
Moderator: General Moderators
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
I snagged the view source HTML from my blog
http://www.everah.com/news/
I used another couple of HTML entries and still some of the output was in ascii format and others were in character form.
PS I think I used the HTML from the Devnet posting.php page and the portal page as well.
http://www.everah.com/news/
I used another couple of HTML entries and still some of the output was in ascii format and others were in character form.
PS I think I used the HTML from the Devnet posting.php page and the portal page as well.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
From the output on the demo...
and then treated the other '<' and '>' signs as ascii chars.
It seemed to drop this bit:Here is the source code of the purified HTML:
Code: Select all
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Common Ground</title><meta name="generator" content="WordPress 2.0.2" /><link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://www.everah.com/news/feed/" /><link rel="alternate" type="text/xml" title="RSS .92" href="http://www.everah.com/news/feed/rss/" /><link rel="alternate" type="application/atom+xml" title="Atom 0.3" href="http://www.everah.com/news/feed/atom/" /><link rel="pingback" href="http://www.everah.com/news/xmlrpc.php" /><link rel="archives" title="July 2006" href="http://www.everah.com/news/2006/07/" /><link rel="archives" title="June 2006" href="http://www.everah.com/news/2006/06/" /><link rel="archives" title="May 2006" href="http://www.everah.com/news/2006/05/" /><link rel="archives" title="April 2006" href="http://www.everah.com/news/2006/04/" /><link rel="archives" title="February 2006" href="http://www.everah.com/news/2006/02/" /><link rel="archives" title="December 2005" href="http://www.everah.com/news/2005/12/" /><link rel="archives" title="November 2005" href="http://www.everah.com/news/2005/11/" /><link rel="archives" title="October 2005" href="http://www.everah.com/news/2005/10/" /><link rel="archives" title="September 2005" href="http://www.everah.com/news/2005/09/" /><link rel="archives" title="July 2005" href="http://www.everah.com/news/2005/07/" /><link rel="archives" title="June 2005" href="http://www.everah.com/news/2005/06/" /><link rel="archives" title="May 2005" href="http://www.everah.com/news/2005/05/" /><link rel="archives" title="April 2005" href="http://www.everah.com/news/2005/04/" /><link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://www.everah.com/news/xmlrpc.php?rsd" /><meta name="author" content="Everah Media Services Company" /><link rel="stylesheet" type="text/css" href="http://www.everah.com/news/wp-content/themes/everahhh/style.css" title="style" /><div id="headerLogo"> <h1><img src="http://www.everah.com/images/everah_logo_new.jpg" alt="Everah Media Services Company" /></h1> </div> <div id="headerMenu"> <ul><li class="first"><a href="http://www.everah.com/" title="About Us">About Us</a></li> <li><a href="http://www.everah.com/" title="About Us">Products</a></li> <li><a href="http://www.everah.com/" title="About Us">Services</a></li> <li><a href="http://www.everah.com/news/" title="News and Announcements">News</a></li> <li><a href="http://www.everah.com/" title="About Us">Contact Us</a></li> </ul></div>
Code: Select all
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html - Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
First question, yes. Second question, no (but I could hack it to get that working). For instance, if you tried to send a form through the app, it would get minced up beyond recognition. The objective is for a user to be able to write a snippet of HTML, not unlike what one would do for a forum (except with BBCode) and then the validator fix it up so that it can be shown to the world without fear of XSS. However, I would also like it to be able to read plain old HTML documents, discard parts not in <body>, and then nicely output the stuff inbetween.Is the object of the application to take what you have entered and turn it into something cleaner, but still usable? If so, any manipulation of code should make it so that the code that is output can be popped into an editor, saved and ran. Or am I being too silly in this thought?
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
I was kinda thinking that. So what you want is to essentially take the contents between <body> and </body> and validate it as well as sanitize it, correct? That seems logical. I would only make one suggestion... let users know that your app will not do anything to the <head> and </head> content. But I love the app. It is a tool that has been needed far too long.Ambush Commander wrote:The objective is for a user to be able to write a snippet of HTML, not unlike what one would do for a forum (except with BBCode) and then the validator fix it up so that it can be shown to the world without fear of XSS. However, I would also like it to be able to read plain old HTML documents, discard parts not in <body>, and then nicely output the stuff inbetween.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Well, the suggestion that you copy-paste random webpages into the thing was to see how well it would deal with elements that it didn't allow. If you really want to fix up an entire document, you probably should use Tidy. This specifically is for segments of code that are not entire documents.
Thanks! But it needs more extensive real world testing.But I love the app. It is a tool that has been needed far too long.
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Hmm... I need tackle some code quality issues (the unit test case currently is broken due to legit behavior changes).
In the meantime, http://ha.ckers.org/xss.html would be a good place to start. But that's a lot of things to test... maybe I'll build a smoketest that reads the XML file and outputs all the outputs.
In the meantime, http://ha.ckers.org/xss.html would be a good place to start. But that's a lot of things to test... maybe I'll build a smoketest that reads the XML file and outputs all the outputs.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US