Opening XML files over 20GB

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
EckstaC
Forum Newbie
Posts: 1
Joined: Thu Sep 20, 2007 3:48 am

Opening XML files over 20GB

Post by EckstaC »

Hi,

I'm about to receive an XML document that could be anywhere between 20GB and 30GB

I've got the task of importing specific data fields from this file into a MySQL 5.0 database. I haven't dealt with a file of this size before and am slightly lost as to the best way to go around processing it.

I have read one article that seems to offer a half-decent solution. I can't see any reason why the file size would make any difference to how this script performs... what do you guys think?

Also, if anybody has any other suggestions of how to deal with this file, it'd be most appreciated!

Cheers
Dave
mrkite
Forum Contributor
Posts: 104
Joined: Tue Sep 11, 2007 4:19 am

Post by mrkite »

That howto looks like it got mangled.. it doesn't seem to actually parse the XML.. just reads it in and writes out a whole crapload of smaller xml files.

To actually parse a 30gig XML file, you're going to want to use a SAX parser, not a DOM parser.. unless you hate your hardware and want to punish it.

SAX parsers use practically no RAM and can handle any size file.

In PHP xml_parser_create() creates a SAX parser. You must then set callbacks for text elements, start tags, end tags, etc. Then you read in your file a chunk at a time, and pass each chunk to xml_parse().
Post Reply