Need help dealing with a large (18MB) XML file...

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

Post Reply
fahrvergnuugen
Forum Newbie
Posts: 2
Joined: Fri Dec 26, 2003 9:47 pm

Need help dealing with a large (18MB) XML file...

Post by fahrvergnuugen »

I need to get some information out of a microsoft project database (Project files can be stored as XML). One problem, project doesn't put line breaks or anything in its XML file so all of the data ends up being on one line :roll:

The other problem, I've never really had a problem loading a whole XML file into memory before. Normally, I just load the whole file into memory, parse it into an array or whatever I need to do with it and everything is fine...

But in this case, running xml_parse_into_struct takes over 10 minutes to process 8O This might be related to the fact that the file has no line breaks, but I'm not sure.

Is there a way to process an xml file little by little instead of loading the whole thing into memory? If so, can you give an example showing how to do it?
User avatar
JAM
DevNet Resident
Posts: 2101
Joined: Fri Aug 08, 2003 6:53 pm
Location: Sweden
Contact:

Post by JAM »

Never dealt with this kind of issue so just ideas...

Depending on the amount of resources it will consume, perhaps rewriting the file to abit more readable (ie, adding linebreaks) and THEN parse it could be something worth looking into?

If speed still is bugging you, combining it with xml_get_current_line_number() might be helpful also.
fahrvergnuugen
Forum Newbie
Posts: 2
Joined: Fri Dec 26, 2003 9:47 pm

Post by fahrvergnuugen »

I solved the problem by reading 4096 bytes of the file at a time like this:

Code: Select all

<?
while($data = @fread($fp, 4096)){
		  if(!xml_parse($xml_parser, $data, feof($->fp))){
		    break;
		  }
		}
?>
once my character data handler finds what its looking for, I fclose($fp) which causes everything to bail out as soon as possible saving time (especially when the search result is found near the top of the XML file).
Post Reply