XHTML 1.1 Modularization for HTML 4.01 parsing
Posted: Fri Feb 02, 2007 10:23 pm
For a while now, I've been informing my parser about HTML's syntax using a PHP file that simulates the XHTML 1.0 DTD. The approach works, but it's not very extensible: a user who would like to add in a custom element is totally out of luck, and even officially-sanctioned flex points like Strict v. Transitional require a spaghetti-mess of conditionals splattered all over the place. There is simply no way to afford the user fine-grained control over the elements.
So I was ruminating on how to fix this problem, and I realized that W3C had already done this in XHTML 1.1, the modularization of XHTML. Every element and related attributes/content-sets are neatly packaged into modules, and you can then select what modules you'd like to allow. With this, I'd be able factor out a lot of the spaghetti code, but ask the user which modules they want (oh, I want to support text and lists, but nothing else.) Users, if they desperately needed certain types of functionality, would be able to implement it themselves and not have to go mucking around the actual code.
However, I am slightly concerned at allegations that XHTML 1.1 breaks backwards-compatibility with the earlier HTML 4.01 and XHTML 1.0 specifications. According to this page, the changes aren't too bad from Strict, and it appears that the Legacy module should enable me to support Transitional elements too, but I am still a little leery. While I suppose my suspicions will only be dispelled once I actually try it out, has anyone had experiences with XHTML 1.1? Is there anything the Legacy module doesn't cover?
(It also strikes me that, if W3C thought well enough about the modularization, all one would have to do is disable the Structure, Applet, Forms (all of them), Object, Frames, Target, Iframe, Metainformation, Scripting, Link and Base modules in an XHTML 1.1 compliant implementation, you'd have "safe" HTML.)
So I was ruminating on how to fix this problem, and I realized that W3C had already done this in XHTML 1.1, the modularization of XHTML. Every element and related attributes/content-sets are neatly packaged into modules, and you can then select what modules you'd like to allow. With this, I'd be able factor out a lot of the spaghetti code, but ask the user which modules they want (oh, I want to support text and lists, but nothing else.) Users, if they desperately needed certain types of functionality, would be able to implement it themselves and not have to go mucking around the actual code.
However, I am slightly concerned at allegations that XHTML 1.1 breaks backwards-compatibility with the earlier HTML 4.01 and XHTML 1.0 specifications. According to this page, the changes aren't too bad from Strict, and it appears that the Legacy module should enable me to support Transitional elements too, but I am still a little leery. While I suppose my suspicions will only be dispelled once I actually try it out, has anyone had experiences with XHTML 1.1? Is there anything the Legacy module doesn't cover?
(It also strikes me that, if W3C thought well enough about the modularization, all one would have to do is disable the Structure, Applet, Forms (all of them), Object, Frames, Target, Iframe, Metainformation, Scripting, Link and Base modules in an XHTML 1.1 compliant implementation, you'd have "safe" HTML.)