Page 1 of 1

Need help dealing with a large (18MB) XML file...

Posted: Fri Dec 26, 2003 9:47 pm
by fahrvergnuugen
I need to get some information out of a microsoft project database (Project files can be stored as XML). One problem, project doesn't put line breaks or anything in its XML file so all of the data ends up being on one line :roll:

The other problem, I've never really had a problem loading a whole XML file into memory before. Normally, I just load the whole file into memory, parse it into an array or whatever I need to do with it and everything is fine...

But in this case, running xml_parse_into_struct takes over 10 minutes to process 8O This might be related to the fact that the file has no line breaks, but I'm not sure.

Is there a way to process an xml file little by little instead of loading the whole thing into memory? If so, can you give an example showing how to do it?

Posted: Fri Jan 02, 2004 11:08 am
by JAM
Never dealt with this kind of issue so just ideas...

Depending on the amount of resources it will consume, perhaps rewriting the file to abit more readable (ie, adding linebreaks) and THEN parse it could be something worth looking into?

If speed still is bugging you, combining it with xml_get_current_line_number() might be helpful also.

Posted: Fri Jan 02, 2004 12:00 pm
by fahrvergnuugen
I solved the problem by reading 4096 bytes of the file at a time like this:

Code: Select all

<?
while($data = @fread($fp, 4096)){
		  if(!xml_parse($xml_parser, $data, feof($->fp))){
		    break;
		  }
		}
?>
once my character data handler finds what its looking for, I fclose($fp) which causes everything to bail out as soon as possible saving time (especially when the search result is found near the top of the XML file).