Big nasty old-fashioned XML file needs parsing

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

Post Reply
Muffie
Forum Newbie
Posts: 1
Joined: Mon Jan 24, 2011 2:18 pm

Big nasty old-fashioned XML file needs parsing

Post by Muffie »

I have a badly-formatted 12.2MB XML file that I want to parse for 2 elements. The problem is that every method I've tried is designed for XML files created by people who know how XML is supposed to work. The structure of my XML is:

Code: Select all

<thing-list>
  <thing type="Item">
    <field name="id"></field>
    <field name="flags"></field>
    <field name="stack-size"></field>
    <field name="type"></field>
    <field name="resource-id"></field>
    <field name="valid-targets"></field>
    <field name="name"></field>
    <field name="description"></field>
    <field name="log-name-singular"></field>
    <field name="log-name-plural"></field>
    <field name="icon">
      <thing type="Graphic">
        <field name="format"></field>
        <field name="flag"></field>
        <field name="category"></field>
        <field name="id"></field>
        <field name="width"></field>
        <field name="height"></field>
        <field name="planes"></field>
        <field name="bits"></field>
        <field name="compression"></field>
        <field name="size"></field>
        <field name="horizontal-resolution"></field>
        <field name="vertical-resolution"></field>
        <field name="used-colors"></field>
        <field name="important-colors"></field>
        <field name="image" format="image/png" encoding="base64"></field>
      </thing>
    </field>
    <field name="unknown-2"></field>
    <field name="unknown-3"></field>
  </thing>
</thing-list>
I'm sure you can see my problem...I want to output a list with the ID and Name of each "thing", but I haven't found a way to read the element value based on the attribute value (if name="id" echo field value). Any suggestions?
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: Big nasty old-fashioned XML file needs parsing

Post by John Cartwright »

I may be a little off, but something like

Code: Select all

$xml = simplexml_load_string($your_massive_xml_string);

foreach ($thing[0] as $field) {
   if ($field['name'] == 'id') {
      //do something
   }
}
Regardless, simplexml is your friend, and there are plenty of docs on it.

If performance is truly an issue, then you may want to consider XML Parser, which can read your XML file in chunks (but is far more complicated).
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Big nasty old-fashioned XML file needs parsing

Post by VladSun »

Muffie wrote:I want to output a list with the ID and Name of each "thing", but I haven't found a way to read the element value based on the attribute value (if name="id" echo field value). Any suggestions?
Just apply an XSLT?

Code: Select all

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/">
	<thing-list>
		<xsl:apply-templates />
	</thing-list>
</xsl:template>

<xsl:template match="/thing-list/thing">
	<thing>
		<xsl:attribute name="name">
			<xsl:value-of select="field[@name='name']"/>
		</xsl:attribute>
		<xsl:attribute name="id">
			<xsl:value-of select="field[@name='id']"/>
		</xsl:attribute>
	</thing>
</xsl:template>

</xsl:stylesheet>
There are 10 types of people in this world, those who understand binary and those who don't
Post Reply