PHP XML processing weirdness
Posted: Sat Oct 23, 2004 4:19 pm
Heya,
PHP 4.3.9 question. I am working on processing a very large XML file (35MB - it's a list of every screen of every movie theatre in North America) and, for obvious reasons I can't just load the entire doc into a string a parse the sucker. But in the PHP docs, it clearly states that I can read in an XML doc a few bytes at a time and use the event based XML parsers to parse the document.
For example:
However, when I parse the document in this way, I notice weird things happening with the data. It seems as if any data between tags gets truncated. However, the docs for the "xml_parse" function clearly state that this is something that should work just fine.
Does anyone have any explanation for what's happening and/or a better way to parse very large XML docs? The solution needs to assume less than 2MB of working RAM and a very standard PHP install with no extra packages (such as PEAR) installed.
Thanks for the insight.
PHP 4.3.9 question. I am working on processing a very large XML file (35MB - it's a list of every screen of every movie theatre in North America) and, for obvious reasons I can't just load the entire doc into a string a parse the sucker. But in the PHP docs, it clearly states that I can read in an XML doc a few bytes at a time and use the event based XML parsers to parse the document.
For example:
Code: Select all
$xml_parser = xml_parser_create("UTF-8");
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die( sprintf( "XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser) ) );
}
}
fclose($fp);
xml_parser_free($xml_parser);Does anyone have any explanation for what's happening and/or a better way to parse very large XML docs? The solution needs to assume less than 2MB of working RAM and a very standard PHP install with no extra packages (such as PEAR) installed.
Thanks for the insight.