Factoring data out of a document
Moderator: General Moderators
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Factoring data out of a document
When you write web documents, you often need to include structured data in the form of tables and lists. The normal way have handling this highly semantic data is to embed it straight in with a gaggle of <td>s and <tr>s. However, if this data needs to appear in other documents, or even appear multiple times in the same document but in different forms, this approach doesn't scale.
So, I'm experimenting with methods of storing this data in other places. The way I look at it, there are two primary places to put it: in a database, or in an XML file. For this particular instance, we'll be using XML files. I'm trying to stay away from databases for the time being.
You also need a method to transform pure data into accessible HTML. It would seem to me that XSLT is a highly natural choice for performing that transformation, and doesn't require a user to write a PHP subroutine each time they wish to perform the transformation.
This is as far as I've gotten so far. Implementing this process in XSLT poses two problems:
1. What syntax should be used for including the XML file? I would naturally gravitate towards XInclude, but it doesn't appear that PHP will XSL process the XML file automatically when its loaded in, making it of minimal usefulness. One would probably end up having to come up with a proprietary XML schema.
2. How would one slice the data in different manners? XSLT is not known for having external parameters, which is a pity, since it means that given an XML file and an XSLT file, the result will invariably be the same. This makes XSLT quite verbose for purposes of reformatting data in different forms: how does the callee document tell the XSLT stylesheet to sort the elements differently, or take only one column or row of data? These fairly simple operations should not require another stylesheet, but it looks like such a thing may be necessary.
I wonder what a suitable method of solving this problem would be. Perhaps runtime DOM modification of the XSLT file?
So, I'm experimenting with methods of storing this data in other places. The way I look at it, there are two primary places to put it: in a database, or in an XML file. For this particular instance, we'll be using XML files. I'm trying to stay away from databases for the time being.
You also need a method to transform pure data into accessible HTML. It would seem to me that XSLT is a highly natural choice for performing that transformation, and doesn't require a user to write a PHP subroutine each time they wish to perform the transformation.
This is as far as I've gotten so far. Implementing this process in XSLT poses two problems:
1. What syntax should be used for including the XML file? I would naturally gravitate towards XInclude, but it doesn't appear that PHP will XSL process the XML file automatically when its loaded in, making it of minimal usefulness. One would probably end up having to come up with a proprietary XML schema.
2. How would one slice the data in different manners? XSLT is not known for having external parameters, which is a pity, since it means that given an XML file and an XSLT file, the result will invariably be the same. This makes XSLT quite verbose for purposes of reformatting data in different forms: how does the callee document tell the XSLT stylesheet to sort the elements differently, or take only one column or row of data? These fairly simple operations should not require another stylesheet, but it looks like such a thing may be necessary.
I wonder what a suitable method of solving this problem would be. Perhaps runtime DOM modification of the XSLT file?
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Alright. If arborint doesn't understand me, something's gone terribly wrong. Let's try again.
For my library HTML Purifier, I have a comparison table of a bunch of other different libraries. The table includes version info, last updated status, and various check marks on major functionality. This info is then duplicated further down in the document, since I recapitulate the data whenever I discuss a library. It looks like:
Nevermore!
So, what I was trying to do, was put this version number and extra properties data in an XML file, and then stuff it back into the document. The mechanism for doing this without embedding PHP in my page is what is stumping me.
XSLT presents itself as a very attractive solution, because it is precisely designed for transforming XML documents into HTML documents. However, it's not really equipped for creating HTML fragments, so I'm being a little creative in my usage of it. Here's the problem: when I output the information for Library 1, I only want the XSLT file to grab that info and format it accordingly. When I want Library 2, I want only that info, formatted accordingly. Translated into XSLT, this would be something equivalent to <xsl:apply-templates match="id(library-1)/*" /> and <xsl:apply-templates match="id(library-2)/*" />. Therein lies the rub: while almost nearly the same, these two cases are subtly different, and need different XSLT files.
The DOM modification is my way of simulating separate XSLT files. Each time we run it, the DOM is slightly different (the selector has changed), so we can get the "effect" of multiple XSLT files. Using variables and includes, this could be a viable solution. I hope this made more sense.
Thinking about it, though, the true XML way would be to have the entire source document in XML, and then create and XSLT stylesheet to convert it to HTML. The monster-table that's causing this duplication would simply be a custom apply-templates that grabs data from other parts of the XML document. I don't want to do this, though, because it moves away from a "document-first" mentality.
For my library HTML Purifier, I have a comparison table of a bunch of other different libraries. The table includes version info, last updated status, and various check marks on major functionality. This info is then duplicated further down in the document, since I recapitulate the data whenever I discuss a library. It looks like:
As a programmer, I see a big problem with this: duplication! Duplication, such as the fact that version number is hard-coded into both the table and the full description, means that when the library is updated, I have to change the code in two spots of the document. I put up with this for a while. But...== Table ==
Library 1 - 1.0 - Fair
Library 2 - .92 - Poor
Library 3 - 4.01 - Great
== Library 1 ==
1.0 / Fair
More description
== Library 2 ==
...
Nevermore!
So, what I was trying to do, was put this version number and extra properties data in an XML file, and then stuff it back into the document. The mechanism for doing this without embedding PHP in my page is what is stumping me.
XSLT presents itself as a very attractive solution, because it is precisely designed for transforming XML documents into HTML documents. However, it's not really equipped for creating HTML fragments, so I'm being a little creative in my usage of it. Here's the problem: when I output the information for Library 1, I only want the XSLT file to grab that info and format it accordingly. When I want Library 2, I want only that info, formatted accordingly. Translated into XSLT, this would be something equivalent to <xsl:apply-templates match="id(library-1)/*" /> and <xsl:apply-templates match="id(library-2)/*" />. Therein lies the rub: while almost nearly the same, these two cases are subtly different, and need different XSLT files.
The DOM modification is my way of simulating separate XSLT files. Each time we run it, the DOM is slightly different (the selector has changed), so we can get the "effect" of multiple XSLT files. Using variables and includes, this could be a viable solution. I hope this made more sense.
Thinking about it, though, the true XML way would be to have the entire source document in XML, and then create and XSLT stylesheet to convert it to HTML. The monster-table that's causing this duplication would simply be a custom apply-templates that grabs data from other parts of the XML document. I don't want to do this, though, because it moves away from a "document-first" mentality.
If only CSS was that powerful! CSS, in its current state, cannot resort data, or selectively pick out data you want. This needs to all be handled server-side.Dynamic CSS generation. You're using XSLT only to place your data within markup, then you use CSS to style that markup.
- Kieran Huggins
- DevNet Master
- Posts: 3635
- Joined: Wed Dec 06, 2006 4:14 pm
- Location: Toronto, Canada
- Contact:
what I've been doing (with varying levels of success) is include all the xml "data" nodes that are relevant, then generate a "document" node that describes the data layout of the XML:
Then I use XSLT to re-format the "document" section, referencing the "data" nodes for raw details. The "pageType" attribute can be used for XSLT block selection.
Also, PHP allows you to use php functions (even user functions!) like xslt functions - could come in useful!
Code: Select all
<xml>
<!-- included-if-relevant -->
<person id="kieran">
<name>Kieran Huggins</name>
<language>PHP</language>
<language>XSLT</language>
<language>CSS</language>
<language>jQuery!</language>
<status>Tired</status>
</person>
<person id="ambush">
<name>Ambush Commander</name>
<language>PHP</language>
<language>XSLT</language>
<language>CSS</language>
<status>Worried</status>
</person>
<!-- /included-if-relevant -->
<!-- generated-structure -->
<document pageType="coderList">
<coders>
<coder>kieran</coder>
<coder>ambush</coder>
</coders>
</document>
<!-- /generated-structure -->
</xml>Also, PHP allows you to use php functions (even user functions!) like xslt functions - could come in useful!
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- Kieran Huggins
- DevNet Master
- Posts: 3635
- Joined: Wed Dec 06, 2006 4:14 pm
- Location: Toronto, Canada
- Contact:
I'm suggesting that all controller actions output data to a middle format: XML / DOM, then the page controller itself determines the "document" section of that XML.
XSLT can then transform the data from that raw, vanilla format to xhtml, JSON, PDF, etc... using http://ca3.php.net/manual/en/ref.xsl.php
Any xhtml bits you need to keep (like paragraphs, etc...) should be wrapped in a CDATA section.
XSLT can then transform the data from that raw, vanilla format to xhtml, JSON, PDF, etc... using http://ca3.php.net/manual/en/ref.xsl.php
Any xhtml bits you need to keep (like paragraphs, etc...) should be wrapped in a CDATA section.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
With the architecture I'm working with, there is no data to create a DOM from: just bunch of XHTML documents.
I think I understand what you're talking about though. You want me to get rid of the intervening XHTML document and go just for XML with XSLT, storing what was originally in the XHTML document inside the XML. I'm not so wild about this idea, though, because the majority of the page is not made up of highly structured data but just your regular old paragraphs and sections.
It occurred to me, though: the XHTML document could itself be the XML document we're processing, which would be pretty interesting. In that case, whenever XSLT encounters a generated section, it references the appropriate XML document (possibly external) to perform the expansion. Instead of having to run XSLT ten times per each library, you only run it once. That probably would work! I'll give it a whirl.
I think I understand what you're talking about though. You want me to get rid of the intervening XHTML document and go just for XML with XSLT, storing what was originally in the XHTML document inside the XML. I'm not so wild about this idea, though, because the majority of the page is not made up of highly structured data but just your regular old paragraphs and sections.
It occurred to me, though: the XHTML document could itself be the XML document we're processing, which would be pretty interesting. In that case, whenever XSLT encounters a generated section, it references the appropriate XML document (possibly external) to perform the expansion. Instead of having to run XSLT ten times per each library, you only run it once. That probably would work! I'll give it a whirl.
- Kieran Huggins
- DevNet Master
- Posts: 3635
- Joined: Wed Dec 06, 2006 4:14 pm
- Location: Toronto, Canada
- Contact:
I only use xslt once at the end, once the DOM is complete.
You could transform the existing xhtml to a different form, but be careful, as it needs to be EXTREMELY valid! Also, in tha case of wikis and forums, the text is marked up as bbcode and/or wikitext,
Incidentally, have you ever heard of HAML? Looks like something you'd be interested in.
You could transform the existing xhtml to a different form, but be careful, as it needs to be EXTREMELY valid! Also, in tha case of wikis and forums, the text is marked up as bbcode and/or wikitext,
Incidentally, have you ever heard of HAML? Looks like something you'd be interested in.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Don't worry. I'm already DOM-izing my XHTML before processing it. In my opinion, this is the only way to really have standards compliant markup.You could transform the existing xhtml to a different form, but be careful, as it needs to be EXTREMELY valid!
Hm??Also, in tha case of wikis and forums, the text is marked up as bbcode and/or wikitext,
Ruby only. Maybe I'll build a PHP parser for it.Incidentally, have you ever heard of HAML? Looks like something you'd be interested in.