Page 1 of 1
How much of CSS to implement?
Posted: Sat Aug 05, 2006 3:45 pm
by Ambush Commander
For
a project I've been working on, I've avoided defining an overarching philosophy which HTML tags are allowed, besides "Don't allow XSS!" As such, I've tended towards keeping even the more obscure HTML elements (q, bdo, tfoot). So I guess that means the library his headed towards the "Allow as much stuff as possible" camp.
Well, it's come back to bite me in the can.
The CSS 2.1 specification defines over one hundred properties, ranging from well used (color, border) to unbelievably obscure (azimuth, richness, table-layout). There's nothing XSS about azimuth, so, by this reasoning, I'll need to implement validation checks for the property. What about mainly layout oriented CSS: widows, page-break-after, cursor?
Not even attributes is this bad (and that's pretty bad: a little less than 200 possible pairs, though I can nuke quite a few because their only used in FORMs and the whatnot).
Combine this with my propensity for well-written code (I could have just made sure the language attribute only had hyphens, letters and numbers, but instead, I read the RFC several times and then implemented all the syntactic constraints, and was mad at myself because I couldn't also package allowed language codes with it), it's starting to look like the release date will need to be pushed to next year.
What do I do? I know there's a software maxim that you must define what you will not implement, but in this case, it's not very clear: I can see how all of these might be useful at one point or another, and one big selling point of the application is that it requires no configuration.
Posted: Sat Aug 05, 2006 5:58 pm
by Chris Corbyn
Oh wow! I've been wondering what all the lexing and stuff you've been doing was for. This looks awesome

Can't wait for it to be complete.
Given how much of a perfectionist you seem to be at a code-level I think you already know what you want to do... you just wish there was a quicker/easier way

Posted: Sat Aug 05, 2006 6:06 pm
by sweatje
visibone has a nice set of charts. One of them is a CSS chart with browser compatiblity noted using colors. You could start with element which are actually implemented by the majority of browsers rather than just by w3c.
Posted: Sat Aug 05, 2006 9:58 pm
by Ambush Commander
Oh wow! I've been wondering what all the lexing and stuff you've been doing was for. This looks awesome Very Happy Can't wait for it to be complete.
Thanks for the encouragement. I guess that helps too!
visibone has a nice set of charts. One of them is a CSS chart with browser compatiblity noted using colors.
I don't think I'm actual going to purchase one, but that's a very good point. Sometimes, I get hung up by some W3C definition that doesn't even work on most major browsers! (like col.char... and it's a cool feature, so it's a little hard to let it die, I'll try to code an attribute transformation that fixes it after everything else is done.)
I'd rather not have to strain my eyes on the chart, but I think a few good Googles will lead me to some useful reference materials.
Posted: Sun Aug 06, 2006 9:28 pm
by Ollie Saunders
Ambush Commander gets my respect. This sounds like a great project but a big undertaking. Where are you getting the man power from?
I am an advocate of web standards myself and this is something I am building into
my project.
HTML Purifier takes a different approach, one that doesn't use specification-ignorant regexes or narrow blacklists.
Does that mean you are going to parse DTDs?
Oh and to answer your question. Implement all of CSS 1 and then all of CSS 2 etc. don't leave things out just because they seem obsecure, if they are in the standards they can be used and therefore abused.
Posted: Sun Aug 06, 2006 9:37 pm
by Ambush Commander
This sounds like a great project but a big undertaking. Where are you getting the man power from?
It's a one-person project for now. It takes too long to get new developers up to speed and inculcate the with the philosophies of the project. Still, I write profuse documentation about stuff like naming conventions in anticipation that some day I'll have to pass this on to someone else... if it ever gets finished).
I was actually quite surprised to find out that I had finished all the attributes. Creating progress tables helped a lot, and I'll be publishing those soon.
Does that mean you are going to parse DTDs?
Nope, because the DTDs 1) allow evil stuff (so I'd have to change them anyway) and 2) don't get the standards right! :-O
Here's an example: HTML was originally built off SGML, which allows tag exclusions for all descendant elements. You cannot have an A tag nested in an A tag.
XML does not allow similar constraints in their DTDs, so the DTD writers where forced to create specialized allow children definitions for the A tag. The only problem is that "This element is disallowed in all descendant elements" is different from "This element is disallowed in all children elements."
As such, this code "theoretically" valid XHTML 1.0 Strict:
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<title>test</title>
</head>
<body>
<div>
<a href="about:blank">asdf
<span><a href="about:blank">Test</a></span>
</a>
</div>
</body></html>
Even though we have nested A tags. Test it yourself:
http://validator.w3.org/
I base my custom HTML definition off of the DTD, but after that, it's all bets off.
Oh and to answer your question. Implement all of CSS 1 and then all of CSS 2 etc. don't leave things out just because they seem obsecure, if they are in the standards they can be used and therefore abused.
Never thought of that, although it's a little too late... I've already went ahead and categorized all of the CSS properties according to usage and dangerousness.
Posted: Sun Aug 06, 2006 9:59 pm
by Ollie Saunders
It's a one-person project for now. It takes too long to get new developers up to speed and inculcate the with the philosophies of the project. Still, I write profuse documentation about stuff like naming conventions in anticipation that some day I'll have to pass this on to someone else... if it ever gets finished).
Best of luck with it then.
Nope, because the DTDs 1) allow evil stuff (so I'd have to change them anyway) and 2) don't get the standards right! :-O
:-O indeed, I'm learning today.
Never thought of that, although it's a little too late... I've already went ahead and categorized all of the CSS properties according to usage and dangerousness.
Well that is still useful for deciding which to do inside the standards. Do the most dangerous/used in CSS1 till the les dangerous/used in CSS1 and then move on to CSS2 doing the same.
Posted: Sun Aug 06, 2006 10:04 pm
by Ambush Commander
True dat. I'll update the document.
Posted: Sun Aug 06, 2006 10:25 pm
by RobertGonzalez
AC, you are a freaking stud. I hope someday I can create something cool that developers will use. So many regulars have made so many cool things here that I feel like a useless soul sometimes.
You project rips dude. Best of luck to you.
Posted: Sun Aug 06, 2006 10:41 pm
by bg
Looks cool man. A lot of this functionality can be found in HTML Tidy, which is written in C and available as a extension for PHP. You may consider creating a wrapper class and letting Tidy do the grinding stuff like fixing tables and bringing html up to spec given the DTD, and then implement whatever other functions are needed to prevent XSS and other exploits. Of course adding the tidy extension as a requirement of your script may not be something you want. At the same time, with some abstraction you could allow it to take advantage of tidy if it is available.
I'm doing something similar with an AJAX framework I'm writing. I have an abstracted class for JSON serialization, which can use either the php_json extension or a json serialization class written in PHP. The code runs perfectly without the json extension, but with it, serialization sees a 2800% performance increase.
Hey and when you take your SVN repos public, I suggest googles code hosting. I got a
project up there.
Posted: Sun Aug 06, 2006 11:03 pm
by Ollie Saunders
Hey and when you take your SVN repos public, I suggest googles code hosting. I got a project up there.
That looks cool, I could use that for my project.
AC, you are a freaking stud. I hope someday I can create something cool that developers will use. So many regulars have made so many cool things here that I feel like a useless soul sometimes.
You could help me with mine Everah, I know you'd be good : D
A lot of this functionality can be found in HTML Tidy
Something tells me AC won't be satisfied with Tidy because its not 100% accurate, in fact probably not 90%.
Posted: Mon Aug 07, 2006 11:15 am
by Ambush Commander
Looks cool man. A lot of this functionality can be found in HTML Tidy, which is written in C and available as a extension for PHP. You may consider creating a wrapper class and letting Tidy do the grinding stuff like fixing tables and bringing html up to spec given the DTD, and then implement whatever other functions are needed to prevent XSS and other exploits. Of course adding the tidy extension as a requirement of your script may not be something you want. At the same time, with some abstraction you could allow it to take advantage of tidy if it is available.
I've thought about it, and you're right that Tidy is faster. However, it also has its quirks (unacceptable behavior) as well as an important difference: Tidy is meant for repairing entire HTML documents, HTMLPurifier for HTML sections. Although that may be changing later... (There are already some hacks in place to turn a snippet into a fully-fledged document). Part of the reason I shun Tidy is because it's always been regarded by MediaWiki developers as a stop-gap fix, a bandaid for the hideous complexity of their parser.
I've done something similar in regards to PHP 5's DOM extension, which can parse HTML, and very quickly too. Use DOMLex when PHP 5 is present, or use DirectLex, a PHP impl.
Performance is going to be a problem, but the heavy optimization will have to happen after everything is written.
Hey and when you take your SVN repos public, I suggest googles code hosting. I got a project up there.
Hmm... I think I'm going to have to open another thread about this.
@Everah: Thanks! Remember, you've got a company, so I wouldn't complain. (Not sure how this would be economically viable... I'll sell consulting/customization services or something)
Posted: Wed Aug 09, 2006 8:05 pm
by Ambush Commander
The Progress table has been published:
http://hp.jpsband.org/live/docs/progress.html
Scroll down for CSS.
Posted: Wed Aug 09, 2006 8:25 pm
by Ollie Saunders
Looks cool. I love a big check list full of small things I can check off quickly

Good luck with it all.
No blink (argh my eyes)
hehe
