Page 2 of 2

Posted: Sun Aug 13, 2006 8:17 pm
by Ambush Commander
Never seen that. What kind of input did you give it?

Posted: Sun Aug 13, 2006 8:20 pm
by RobertGonzalez
I snagged the view source HTML from my blog

http://www.everah.com/news/

I used another couple of HTML entries and still some of the output was in ascii format and others were in character form.

PS I think I used the HTML from the Devnet posting.php page and the portal page as well.

Posted: Sun Aug 13, 2006 8:27 pm
by Ambush Commander
Still not seeing it. Snagged a different bug though (forgot to define allowed children for caption!)

Posted: Sun Aug 13, 2006 8:32 pm
by RobertGonzalez
From the output on the demo...
Here is the source code of the purified HTML:

Code: Select all

 xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Common Ground</title><meta name="generator" content="WordPress 2.0.2" /><link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://www.everah.com/news/feed/" /><link rel="alternate" type="text/xml" title="RSS .92" href="http://www.everah.com/news/feed/rss/" /><link rel="alternate" type="application/atom+xml" title="Atom 0.3" href="http://www.everah.com/news/feed/atom/" /><link rel="pingback" href="http://www.everah.com/news/xmlrpc.php" /><link rel="archives" title="July 2006" href="http://www.everah.com/news/2006/07/" /><link rel="archives" title="June 2006" href="http://www.everah.com/news/2006/06/" /><link rel="archives" title="May 2006" href="http://www.everah.com/news/2006/05/" /><link rel="archives" title="April 2006" href="http://www.everah.com/news/2006/04/" /><link rel="archives" title="February 2006" href="http://www.everah.com/news/2006/02/" /><link rel="archives" title="December 2005" href="http://www.everah.com/news/2005/12/" /><link rel="archives" title="November 2005" href="http://www.everah.com/news/2005/11/" /><link rel="archives" title="October 2005" href="http://www.everah.com/news/2005/10/" /><link rel="archives" title="September 2005" href="http://www.everah.com/news/2005/09/" /><link rel="archives" title="July 2005" href="http://www.everah.com/news/2005/07/" /><link rel="archives" title="June 2005" href="http://www.everah.com/news/2005/06/" /><link rel="archives" title="May 2005" href="http://www.everah.com/news/2005/05/" /><link rel="archives" title="April 2005" href="http://www.everah.com/news/2005/04/" /><link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://www.everah.com/news/xmlrpc.php?rsd" /><meta name="author" content="Everah Media Services Company" /><link rel="stylesheet" type="text/css" href="http://www.everah.com/news/wp-content/themes/everahhh/style.css" title="style" /><div id="headerLogo">
	<h1><img src="http://www.everah.com/images/everah_logo_new.jpg" alt="Everah Media Services Company" /></h1>
</div>
<div id="headerMenu">
	<ul><li class="first"><a href="http://www.everah.com/" title="About Us">About Us</a></li>
		<li><a href="http://www.everah.com/" title="About Us">Products</a></li>
		<li><a href="http://www.everah.com/" title="About Us">Services</a></li>

		<li><a href="http://www.everah.com/news/" title="News and Announcements">News</a></li>
		<li><a href="http://www.everah.com/" title="About Us">Contact Us</a></li>
	</ul></div>
It seemed to drop this bit:

Code: Select all

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html 
and then treated the other '<' and '>' signs as ascii chars.

Posted: Sun Aug 13, 2006 8:37 pm
by Ambush Commander
Ohhh, that's what you're talking about. It's not smart enough to recognize comments from doctypes, and drops them. However, it text-ifies certain types of invalid tags. I wasn't precisely sure what the correct behavior should be: maybe I should just silently drop invalid tags?

Posted: Sun Aug 13, 2006 8:40 pm
by RobertGonzalez
Is the object of the application to take what you have entered and turn it into something cleaner, but still usable? If so, any manipulation of code should make it so that the code that is output can be popped into an editor, saved and ran. Or am I being too silly in this thought?

Posted: Sun Aug 13, 2006 8:44 pm
by Ambush Commander
Is the object of the application to take what you have entered and turn it into something cleaner, but still usable? If so, any manipulation of code should make it so that the code that is output can be popped into an editor, saved and ran. Or am I being too silly in this thought?
First question, yes. Second question, no (but I could hack it to get that working). For instance, if you tried to send a form through the app, it would get minced up beyond recognition. The objective is for a user to be able to write a snippet of HTML, not unlike what one would do for a forum (except with BBCode) and then the validator fix it up so that it can be shown to the world without fear of XSS. However, I would also like it to be able to read plain old HTML documents, discard parts not in <body>, and then nicely output the stuff inbetween.

Posted: Sun Aug 13, 2006 8:49 pm
by RobertGonzalez
Ambush Commander wrote:The objective is for a user to be able to write a snippet of HTML, not unlike what one would do for a forum (except with BBCode) and then the validator fix it up so that it can be shown to the world without fear of XSS. However, I would also like it to be able to read plain old HTML documents, discard parts not in <body>, and then nicely output the stuff inbetween.
I was kinda thinking that. So what you want is to essentially take the contents between <body> and </body> and validate it as well as sanitize it, correct? That seems logical. I would only make one suggestion... let users know that your app will not do anything to the <head> and </head> content. But I love the app. It is a tool that has been needed far too long.

Posted: Sun Aug 13, 2006 8:53 pm
by Ambush Commander
Well, the suggestion that you copy-paste random webpages into the thing was to see how well it would deal with elements that it didn't allow. If you really want to fix up an entire document, you probably should use Tidy. This specifically is for segments of code that are not entire documents.
But I love the app. It is a tool that has been needed far too long.
Thanks! But it needs more extensive real world testing.

Posted: Sun Aug 13, 2006 9:16 pm
by RobertGonzalez
That's why we're here...:wink:

Posted: Sun Aug 13, 2006 9:20 pm
by Ambush Commander
Hmm... I need tackle some code quality issues (the unit test case currently is broken due to legit behavior changes).

In the meantime, http://ha.ckers.org/xss.html would be a good place to start. But that's a lot of things to test... maybe I'll build a smoketest that reads the XML file and outputs all the outputs.

Posted: Sun Aug 13, 2006 10:42 pm
by Ambush Commander
Okay, I'm going to (sorta) close this thread and open a new one in Security, since this really isn't the right place anymore.

Posted: Sun Aug 13, 2006 10:46 pm
by RobertGonzalez
I can split this thread into a new topic. Want me to do that or do you want to start a new thread on your own?

Posted: Sun Aug 13, 2006 10:47 pm
by Ambush Commander
I'm typing up the new one as I speak.