Page 1 of 2

Loading really large files

Posted: Thu Jan 11, 2007 10:52 pm
by Luke
I have a csv file that is over 5MB and I need to parse it and convert it to an array so I can check it against a database. PHP file_get_contents() just fine, but when I go to explode() it chokes and says Allowed memory size of 8388608 bytes exhausted. How should I go about this? I've never worked with a file this large before.

Posted: Thu Jan 11, 2007 11:09 pm
by volka
maybe fgetcsv?

Posted: Thu Jan 11, 2007 11:10 pm
by Kieran Huggins
You could increase the memory limit in php.ini

if that's not an option and you have 5.1.0 or greater installed you can use the optional arguments (offset & maxlen):
http://www.php.net/manual/en/function.file-get-contents.php wrote:file_get_contents ( string filename [, bool use_include_path [, resource context [, int offset [, int maxlen]]]] )
to load the file in segments.

Posted: Thu Jan 11, 2007 11:31 pm
by feyd
I'll agree with volka here, fgetcsv().

Posted: Fri Jan 12, 2007 12:44 am
by Kieran Huggins
fgetcsv()
8O Cool!

Posted: Fri Jan 12, 2007 1:01 am
by Luke
that works perfectly... wow what a cool function! thanks.

Posted: Fri Jan 12, 2007 7:16 am
by s.dot
I've successfully used fgetcsv() to import a postal code database into mysql. The file was about 600 MB. 8O

Posted: Fri Jan 12, 2007 7:25 am
by Kieran Huggins
:cry: I manually cut and uploaded a 50 MB CSV in 3MB chunks once. That's time I'll never get back!

Posted: Wed Jan 17, 2007 2:06 am
by Luke
OK, let me explain why I need this functionality and see if anybody has any better ideas. The fgetcsv function works great except now I am trying to load in two other files (that aren't quite as large, but still big). These three files are exports from vendor databases. I need to take each csv file, and add up the total amount in stock of each sku (which is a field in each csv file) and then update a mysql database with the results. Anybody know what the best way to do this would be? Just about everything I've done has resulted in:
Fatal error: Allowed memory size of 8388608 bytes exhausted (tried to allocate xxx bytes)
I'm on a shared server, so my options are limited. :(

Posted: Wed Jan 17, 2007 8:44 am
by feyd
Keep dumping them into the database, query afterward?

How many products are we talking here? It may be possible to keep an aggregate array (or variable) for just this total around.

Posted: Wed Jan 17, 2007 11:11 am
by Luke
I think (and I'll have to double check) that it's about 10,000 per file, but the first one contains a lot more fields (and that's why it's so much larger).

Posted: Wed Jan 17, 2007 11:45 am
by feyd
I'd go with dump to database, pull afterward then.

Posted: Wed Jan 17, 2007 11:47 am
by Luke
like...

Code: Select all

LOAD INFILE ... 
:?:

Posted: Wed Jan 17, 2007 11:48 am
by feyd
If possible, sure, but fgetcsv() works too. :P

Posted: Wed Jan 17, 2007 9:13 pm
by Luke
feyd wrote:If possible, sure, but fgetcsv() works too. :P
:? OK, so something like:

Code: Select all

foreach($file as $val)
{

	$fh = fopen($val['file_name'], 'r');
	while (($data = fgetcsv($fh, 1000, ',', '"')) !== FALSE)
	{
		$vendor_sku = getAlnum($data[0]);
		$available = (integer) getDigits($data[$val['stock_position']]);
                // UPDATE products SET in_stock = (SELECT in_stock FROM products WHERE pSku = $vendor_sku) + $available
	}
	fclose($fh);

}
Another question... if I do it like this, it will add up whatever is already in the database's in_stock on top of the three csv files in_stock values. I don't want that. So, should I set all in_stock to zero before running this? That seems like the solution, but it seems like that could potentially lead to problems. What do you think?

EDIT: sigh... it seems no matter what I do, it exceeds allowed memory size. :(