HTMLPurifier PHP library homepage

It doesn't matter if you do all the error checking in the world, or if you have the most beautiful graphics, if your site or application design isn't usable, it's not going to do well. Get input and advice on usability and user interface issues here.

Moderator: General Moderators

User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Never seen that. What kind of input did you give it?
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

I snagged the view source HTML from my blog

http://www.everah.com/news/

I used another couple of HTML entries and still some of the output was in ascii format and others were in character form.

PS I think I used the HTML from the Devnet posting.php page and the portal page as well.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Still not seeing it. Snagged a different bug though (forgot to define allowed children for caption!)
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

From the output on the demo...
Here is the source code of the purified HTML:

Code: Select all

 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Common Ground</title><meta name="generator" content="WordPress 2.0.2" /><link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://www.everah.com/news/feed/" /><link rel="alternate" type="text/xml" title="RSS .92" href="http://www.everah.com/news/feed/rss/" /><link rel="alternate" type="application/atom+xml" title="Atom 0.3" href="http://www.everah.com/news/feed/atom/" /><link rel="pingback" href="http://www.everah.com/news/xmlrpc.php" /><link rel="archives" title="July 2006" href="http://www.everah.com/news/2006/07/" /><link rel="archives" title="June 2006" href="http://www.everah.com/news/2006/06/" /><link rel="archives" title="May 2006" href="http://www.everah.com/news/2006/05/" /><link rel="archives" title="April 2006" href="http://www.everah.com/news/2006/04/" /><link rel="archives" title="February 2006" href="http://www.everah.com/news/2006/02/" /><link rel="archives" title="December 2005" href="http://www.everah.com/news/2005/12/" /><link rel="archives" title="November 2005" href="http://www.everah.com/news/2005/11/" /><link rel="archives" title="October 2005" href="http://www.everah.com/news/2005/10/" /><link rel="archives" title="September 2005" href="http://www.everah.com/news/2005/09/" /><link rel="archives" title="July 2005" href="http://www.everah.com/news/2005/07/" /><link rel="archives" title="June 2005" href="http://www.everah.com/news/2005/06/" /><link rel="archives" title="May 2005" href="http://www.everah.com/news/2005/05/" /><link rel="archives" title="April 2005" href="http://www.everah.com/news/2005/04/" /><link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://www.everah.com/news/xmlrpc.php?rsd" /><meta name="author" content="Everah Media Services Company" /><link rel="stylesheet" type="text/css" href="http://www.everah.com/news/wp-content/themes/everahhh/style.css" title="style" /><div id="headerLogo">
	<h1><img src="http://www.everah.com/images/everah_logo_new.jpg" alt="Everah Media Services Company" /></h1>
</div>
<div id="headerMenu">
	<ul><li class="first"><a href="http://www.everah.com/" title="About Us">About Us</a></li>
		<li><a href="http://www.everah.com/" title="About Us">Products</a></li>
		<li><a href="http://www.everah.com/" title="About Us">Services</a></li>

		<li><a href="http://www.everah.com/news/" title="News and Announcements">News</a></li>
		<li><a href="http://www.everah.com/" title="About Us">Contact Us</a></li>
	</ul></div>
It seemed to drop this bit:

Code: Select all

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html 
and then treated the other '<' and '>' signs as ascii chars.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Ohhh, that's what you're talking about. It's not smart enough to recognize comments from doctypes, and drops them. However, it text-ifies certain types of invalid tags. I wasn't precisely sure what the correct behavior should be: maybe I should just silently drop invalid tags?
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

Is the object of the application to take what you have entered and turn it into something cleaner, but still usable? If so, any manipulation of code should make it so that the code that is output can be popped into an editor, saved and ran. Or am I being too silly in this thought?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Is the object of the application to take what you have entered and turn it into something cleaner, but still usable? If so, any manipulation of code should make it so that the code that is output can be popped into an editor, saved and ran. Or am I being too silly in this thought?
First question, yes. Second question, no (but I could hack it to get that working). For instance, if you tried to send a form through the app, it would get minced up beyond recognition. The objective is for a user to be able to write a snippet of HTML, not unlike what one would do for a forum (except with BBCode) and then the validator fix it up so that it can be shown to the world without fear of XSS. However, I would also like it to be able to read plain old HTML documents, discard parts not in <body>, and then nicely output the stuff inbetween.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

Ambush Commander wrote:The objective is for a user to be able to write a snippet of HTML, not unlike what one would do for a forum (except with BBCode) and then the validator fix it up so that it can be shown to the world without fear of XSS. However, I would also like it to be able to read plain old HTML documents, discard parts not in <body>, and then nicely output the stuff inbetween.
I was kinda thinking that. So what you want is to essentially take the contents between <body> and </body> and validate it as well as sanitize it, correct? That seems logical. I would only make one suggestion... let users know that your app will not do anything to the <head> and </head> content. But I love the app. It is a tool that has been needed far too long.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Well, the suggestion that you copy-paste random webpages into the thing was to see how well it would deal with elements that it didn't allow. If you really want to fix up an entire document, you probably should use Tidy. This specifically is for segments of code that are not entire documents.
But I love the app. It is a tool that has been needed far too long.
Thanks! But it needs more extensive real world testing.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

That's why we're here...:wink:
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Hmm... I need tackle some code quality issues (the unit test case currently is broken due to legit behavior changes).

In the meantime, http://ha.ckers.org/xss.html would be a good place to start. But that's a lot of things to test... maybe I'll build a smoketest that reads the XML file and outputs all the outputs.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Okay, I'm going to (sorta) close this thread and open a new one in Security, since this really isn't the right place anymore.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

I can split this thread into a new topic. Want me to do that or do you want to start a new thread on your own?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

I'm typing up the new one as I speak.
Post Reply