Factoring data out of a document

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Factoring data out of a document

Post by Ambush Commander »

When you write web documents, you often need to include structured data in the form of tables and lists. The normal way have handling this highly semantic data is to embed it straight in with a gaggle of <td>s and <tr>s. However, if this data needs to appear in other documents, or even appear multiple times in the same document but in different forms, this approach doesn't scale.

So, I'm experimenting with methods of storing this data in other places. The way I look at it, there are two primary places to put it: in a database, or in an XML file. For this particular instance, we'll be using XML files. I'm trying to stay away from databases for the time being.

You also need a method to transform pure data into accessible HTML. It would seem to me that XSLT is a highly natural choice for performing that transformation, and doesn't require a user to write a PHP subroutine each time they wish to perform the transformation.

This is as far as I've gotten so far. Implementing this process in XSLT poses two problems:

1. What syntax should be used for including the XML file? I would naturally gravitate towards XInclude, but it doesn't appear that PHP will XSL process the XML file automatically when its loaded in, making it of minimal usefulness. One would probably end up having to come up with a proprietary XML schema.

2. How would one slice the data in different manners? XSLT is not known for having external parameters, which is a pity, since it means that given an XML file and an XSLT file, the result will invariably be the same. This makes XSLT quite verbose for purposes of reformatting data in different forms: how does the callee document tell the XSLT stylesheet to sort the elements differently, or take only one column or row of data? These fairly simple operations should not require another stylesheet, but it looks like such a thing may be necessary.

I wonder what a suitable method of solving this problem would be. Perhaps runtime DOM modification of the XSLT file?
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

I fell asleep right after "this highly semantic data...". Back awake but not wanting to resubmit my brain to Ambushian torture again ... help ... help ... can any one hear me!

He doesn't know where to put it so he's getting into DOM modification?!?
(#10850)
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Post by Jenk »

Dynamic CSS generation. You're using XSLT only to place your data within markup, then you use CSS to style that markup.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Alright. If arborint doesn't understand me, something's gone terribly wrong. Let's try again.

For my library HTML Purifier, I have a comparison table of a bunch of other different libraries. The table includes version info, last updated status, and various check marks on major functionality. This info is then duplicated further down in the document, since I recapitulate the data whenever I discuss a library. It looks like:
== Table ==
Library 1 - 1.0 - Fair
Library 2 - .92 - Poor
Library 3 - 4.01 - Great

== Library 1 ==
1.0 / Fair
More description

== Library 2 ==
...
As a programmer, I see a big problem with this: duplication! Duplication, such as the fact that version number is hard-coded into both the table and the full description, means that when the library is updated, I have to change the code in two spots of the document. I put up with this for a while. But...

Nevermore!

So, what I was trying to do, was put this version number and extra properties data in an XML file, and then stuff it back into the document. The mechanism for doing this without embedding PHP in my page is what is stumping me.

XSLT presents itself as a very attractive solution, because it is precisely designed for transforming XML documents into HTML documents. However, it's not really equipped for creating HTML fragments, so I'm being a little creative in my usage of it. Here's the problem: when I output the information for Library 1, I only want the XSLT file to grab that info and format it accordingly. When I want Library 2, I want only that info, formatted accordingly. Translated into XSLT, this would be something equivalent to <xsl:apply-templates match="id(library-1)/*" /> and <xsl:apply-templates match="id(library-2)/*" />. Therein lies the rub: while almost nearly the same, these two cases are subtly different, and need different XSLT files.

The DOM modification is my way of simulating separate XSLT files. Each time we run it, the DOM is slightly different (the selector has changed), so we can get the "effect" of multiple XSLT files. Using variables and includes, this could be a viable solution. I hope this made more sense.

Thinking about it, though, the true XML way would be to have the entire source document in XML, and then create and XSLT stylesheet to convert it to HTML. The monster-table that's causing this duplication would simply be a custom apply-templates that grabs data from other parts of the XML document. I don't want to do this, though, because it moves away from a "document-first" mentality.
Dynamic CSS generation. You're using XSLT only to place your data within markup, then you use CSS to style that markup.
If only CSS was that powerful! CSS, in its current state, cannot resort data, or selectively pick out data you want. This needs to all be handled server-side.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

what I've been doing (with varying levels of success) is include all the xml "data" nodes that are relevant, then generate a "document" node that describes the data layout of the XML:

Code: Select all

<xml>
<!-- included-if-relevant -->
  <person id="kieran">
    <name>Kieran Huggins</name>
    <language>PHP</language>
    <language>XSLT</language>
    <language>CSS</language>
    <language>jQuery!</language>
    <status>Tired</status>
  </person>
  <person id="ambush">
    <name>Ambush Commander</name>
    <language>PHP</language>
    <language>XSLT</language>
    <language>CSS</language>
    <status>Worried</status>
  </person>
<!-- /included-if-relevant -->
<!-- generated-structure -->
  <document pageType="coderList">
    <coders>
      <coder>kieran</coder>
      <coder>ambush</coder>
    </coders>
  </document>
<!-- /generated-structure -->
</xml>
Then I use XSLT to re-format the "document" section, referencing the "data" nodes for raw details. The "pageType" attribute can be used for XSLT block selection.

Also, PHP allows you to use php functions (even user functions!) like xslt functions - could come in useful!
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Are you suggesting processing the entire document through XSLT, including proprietary XML bits that tell XSLT which data to grab? What I did notice, however, was that you used a plain XML document. Preferably, I'd like the source document to be XHTML.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

I'm suggesting that all controller actions output data to a middle format: XML / DOM, then the page controller itself determines the "document" section of that XML.

XSLT can then transform the data from that raw, vanilla format to xhtml, JSON, PDF, etc... using http://ca3.php.net/manual/en/ref.xsl.php

Any xhtml bits you need to keep (like paragraphs, etc...) should be wrapped in a CDATA section.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

With the architecture I'm working with, there is no data to create a DOM from: just bunch of XHTML documents.

I think I understand what you're talking about though. You want me to get rid of the intervening XHTML document and go just for XML with XSLT, storing what was originally in the XHTML document inside the XML. I'm not so wild about this idea, though, because the majority of the page is not made up of highly structured data but just your regular old paragraphs and sections.

It occurred to me, though: the XHTML document could itself be the XML document we're processing, which would be pretty interesting. In that case, whenever XSLT encounters a generated section, it references the appropriate XML document (possibly external) to perform the expansion. Instead of having to run XSLT ten times per each library, you only run it once. That probably would work! I'll give it a whirl.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

I only use xslt once at the end, once the DOM is complete.

You could transform the existing xhtml to a different form, but be careful, as it needs to be EXTREMELY valid! Also, in tha case of wikis and forums, the text is marked up as bbcode and/or wikitext,

Incidentally, have you ever heard of HAML? Looks like something you'd be interested in.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

You could transform the existing xhtml to a different form, but be careful, as it needs to be EXTREMELY valid!
Don't worry. I'm already DOM-izing my XHTML before processing it. In my opinion, this is the only way to really have standards compliant markup.
Also, in tha case of wikis and forums, the text is marked up as bbcode and/or wikitext,
Hm??
Incidentally, have you ever heard of HAML? Looks like something you'd be interested in.
Ruby only. Maybe I'll build a PHP parser for it.
Post Reply