Page 3 of 3

Posted: Tue Nov 21, 2006 8:18 pm
by neophyte
Nice work AC! Does it support multiple DTD (transitional || strict)?

I was playing with your "test" it box.

I tried this -- opening b tag but someone forgot the b on the other end...

Code: Select all

<b><font>Whatever</font></>
It gave me this for source code output.

Code: Select all

<b><span>Whatever</span>></b>

Posted: Tue Nov 21, 2006 8:24 pm
by Ambush Commander
It only supports Transitional currently, Strict is coming soon. There's already quite a bit of code for getting strict to work, for instance, you saw font -> span, that's element transformation code that turns font tags into spans with css styling.

Other than that, everything worked as expected (HTML is well-formed) except for that stray greater than sign, that's from the parser. We are filtering, after all, not trying to guess what the user meant.

Posted: Wed Nov 22, 2006 9:41 pm
by Ambush Commander
Okay, the trunk version supports (X)HTML Strict, test it out here: HTML Purifier live demo.

Posted: Sun Nov 26, 2006 6:16 pm
by Ambush Commander
1.3.0 released. Lots of goodies:

* (X)HTML Strict now supported
* You can arbitrarily define which elements and attributes to allow by using %HTML.AllowedElements and %HTML.AllowedAttributes.
* Invalid images are now removed, rather than replaced with dud <img src="" alt="Invalid image" /> image (which still results in an extra HTTP request). Revert to previous behavior by setting %Core.RemoveInvalidImg to false.
* Rudimentary URI host blacklisting implemented with %URI.HostBlacklist.
* New directive %URI.Munge, munges URI so you can use some sort of redirector service to avoid PageRank leaks or warn users that they are exiting your site.
* <li value="4"> and <ul start="2"> now allowed in loose mode.
* These new configuration directives: %HTML.BlockWrapper, %HTML.Parent, %URI.DisableExternalResources, %URI.DisableResources and %Attr.DisableURI. Find about these options and more at the configuration documentation.

d11wtq, since you use HTML Purifier for cleaning up emails, you may be especially interested in %URI.DisableResources, i.e. blocking external images.