Importing large numbers of xml files to MySQL

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
swan52
Forum Newbie
Posts: 14
Joined: Fri Jul 09, 2010 1:58 pm

Importing large numbers of xml files to MySQL

Post by swan52 »

Recently I have been using SimpleXMl to import a fairly large number of xml files into a database, however due to the constraints of the PHP I was using, I was only able to upload a batch of maybe 50 at a time which wasn't a big issue when I had about 600 to deal with, however ultimately I am looking to import a second set of over 40,000 files and I am not quite sure what the most effective way of doing it would be. I was considering using something like Ajax as a solution but I thought perhaps it best to ask for a little advice first.
So er, what would the best approach be to import a very large number of files to MySQL?
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Importing large numbers of xml files to MySQL

Post by Christopher »

First, you probably don't want to process 40k files through the web server -- so command line is probably a better direction. You can do the import file by file (as it sounds like you are doing). You might want to think of reading the XML files and writing the data to a delimited file. Then you can use LOAD DATA INFILE to import the data in a fast, single shot.
(#10850)
swan52
Forum Newbie
Posts: 14
Joined: Fri Jul 09, 2010 1:58 pm

Re: Importing large numbers of xml files to MySQL

Post by swan52 »

Is command line compatable with SimpleXML?

The reason I ask is because I am not entirely familiar with command line, not at all in fact and were it not the case I it would mean an awful lot of time has been wasted writing the formulas that I used to extract the .xml data.

As you guessed, my method for inputting the files, (basic though it was) was first to get a list of the files by throwing a .bat in the directory that read - "dir /o:gen >C:\List_Files.txt" - paste the contents of the document into a spreadsheet and then simply to turn it into a long list of filenames for using in a very very long array to loop through. Obviously this didn't work because it exceded too many memory and timeout limits, so are you saying I circumvent this with command line and load one file at a time this way or will I be required to rewrite the everything including the er schemas(?) I made for extracting the .xml data in a different language?
Post Reply