Recently I have been using SimpleXMl to import a fairly large number of xml files into a database, however due to the constraints of the PHP I was using, I was only able to upload a batch of maybe 50 at a time which wasn't a big issue when I had about 600 to deal with, however ultimately I am looking to import a second set of over 40,000 files and I am not quite sure what the most effective way of doing it would be. I was considering using something like Ajax as a solution but I thought perhaps it best to ask for a little advice first.
So er, what would the best approach be to import a very large number of files to MySQL?
Importing large numbers of xml files to MySQL
Moderator: General Moderators
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Importing large numbers of xml files to MySQL
First, you probably don't want to process 40k files through the web server -- so command line is probably a better direction. You can do the import file by file (as it sounds like you are doing). You might want to think of reading the XML files and writing the data to a delimited file. Then you can use LOAD DATA INFILE to import the data in a fast, single shot.
(#10850)
Re: Importing large numbers of xml files to MySQL
Is command line compatable with SimpleXML?
The reason I ask is because I am not entirely familiar with command line, not at all in fact and were it not the case I it would mean an awful lot of time has been wasted writing the formulas that I used to extract the .xml data.
As you guessed, my method for inputting the files, (basic though it was) was first to get a list of the files by throwing a .bat in the directory that read - "dir /o:gen >C:\List_Files.txt" - paste the contents of the document into a spreadsheet and then simply to turn it into a long list of filenames for using in a very very long array to loop through. Obviously this didn't work because it exceded too many memory and timeout limits, so are you saying I circumvent this with command line and load one file at a time this way or will I be required to rewrite the everything including the er schemas(?) I made for extracting the .xml data in a different language?
The reason I ask is because I am not entirely familiar with command line, not at all in fact and were it not the case I it would mean an awful lot of time has been wasted writing the formulas that I used to extract the .xml data.
As you guessed, my method for inputting the files, (basic though it was) was first to get a list of the files by throwing a .bat in the directory that read - "dir /o:gen >C:\List_Files.txt" - paste the contents of the document into a spreadsheet and then simply to turn it into a long list of filenames for using in a very very long array to loop through. Obviously this didn't work because it exceded too many memory and timeout limits, so are you saying I circumvent this with command line and load one file at a time this way or will I be required to rewrite the everything including the er schemas(?) I made for extracting the .xml data in a different language?