out of memory loading database

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
aparker
Forum Newbie
Posts: 6
Joined: Wed May 10, 2006 8:08 am

out of memory loading database

Post by aparker »

I am having major problems loading a 20+MB comma-delimited file into MySQL. Initially the loader was timing out, so I fixed that with periodic use of set_time_limit(). Now, even after doing everything I can imagine, the system aborts with an OUT-OF-MEMORY message. Tracking it I have found with each block of rows added the memory usage (determined with memory_get_usage() ) is climbing and I can find nothing to free that memory. I have tried ob_flush(), ob_clean(), ob_end_clean() and nothing seems to improve significantly.

Someone suggested the possibility of pushing all the parsing and database inserts into an external function. The advantage supposedly would be to create the function running in its own memory space. The main would pull a record from the text file and pass the string to the function. As soon as THE FUNCTION finished with the insert it would either return true or false to the main program. On return, the function would release all its memory and hopefully reduce the expanding memory usage in the main program. I tried this an was able to add an insignificant 360 more records before maxing out memory again.

I have completely eliminated all indexes from the table to rule out possibility of mutiple or multi-field indexes dragging the load and placing excessive load on memory. I actually loaded approx. 1000 less records that way before crashing.Anyone have some options this old man hasn't thought of?

Memory usage:
Loading data from DataAgents20060506.txt. Memory usage at beginning is 398168
Memory used: 8618824 after extension request; added 4500 rows
Memory used: 24891536 after extension request; added 13500 rows
Memory used: 26214290 after extension request; added 14736 rows
Fatal error: Allowed memory size of 26214400 bytes exhausted (tried to allocate 4065 bytes)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

How are you handling the loading of this file?

I reckon the best way to do it will be to NOT load the whole file in one go.

Use fgets() to pull out X lines at a time into a stack, insert that data into the database and then unset() the stack. Repeat the process over and over until fgest() reaches feof() ;)

Maybe if you posted some code we could identify where the issue can be avoided :)

Pseudo:

Code: Select all

$file = './import.txt';
$rows_per_time = 50;

$stack = array();

$rows = 0;

$handle = fopen($file, 'r');

while(!feof($handle))
{
    $rows++;
    $stack[] = fgets($handle);
    
    if ($rows >= $rows_per_time)
    {
        //pass the stack to a function to insert the data
        //
        unset($stack);
        $rows = 0;
        $stack = array();
    }
}

fclose($handle);
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Post by pickle »

You can also increase the memory allocated to PHP even further - give it 100MB.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
aparker
Forum Newbie
Posts: 6
Joined: Wed May 10, 2006 8:08 am

Post by aparker »

d11wtq wrote:How are you handling the loading of this file?
...
Maybe if you posted some code we could identify where the issue can be avoided :)
Sorry. That would help, wouldn't it.

Code: Select all

<?php
$strstrdiv = "","";
$strnumdiv = "",";
$numstrdiv = ","";
$startdiv = """;
$numdiv = ",";

if ($datfp = fopen($loadfile, "r")) {  /* open the DATA file for loading */

	/* load strategy for this operation if flush-and-fill so delete everything in the table first */
	//*
	$nConnID = opendb();
		@mysql_query("DELETE FROM ".$load_order[$sequence]["table"]." WHERE 1", $nConnID) or die("ERROR: ".mysql_error()); 
		@mysql_query("ALTER TABLE ".$load_order[$sequence]["table"]." PACK_KEYS=0 CHECKSUM=0 DELAY_KEY_WRITE=0 AUTO_INCREMENT=1", $nConnID) or die("ERROR: ".mysql_error());
	closedb($nConnID);
	//*/

	while (!feof($datfp) ) {
		$datastr = fgets($datfp);  /* get one record from the buffer.  fgets() will pull up to the \n character that terminates each line. */
		
		while ( $ptr < $length ) {
			if ($i == $lastelement) {
				$ptr++;
				$end = $length;
				$var = "";
				for ($j=0; $j < ($end-$ptr); $j++) { $var .= substr($datastr, ($ptr+$j), 1); }
				$ptr = $end;
			}else{
				if ($types[$i] == "c") {
					$ptr++;
					if ($types[($i+1)] == "c" OR $types[($i+1)] == "d" OR $types[($i+1)] == "h") { 
						$end = strpos($datastr, $strstrdiv, $ptr); 
						$nextptr = $end + 2;
					}
					if ($types[($i+1)] == "n") { 
						$end = strpos($datastr, $strnumdiv, $ptr); 
						$nextptr = ($end+1);
					}
					
					$var = "";
					for ($j=0; $j < ($end-$ptr); $j++) { $var .= substr($datastr, ($ptr+$j), 1); }
					$ptr = $nextptr;
				}else{
					$ptr++;
					if ($types[($i+1)] == "c" OR $types[($i+1)] == "d" OR $types[($i+1)] == "h") { 
						$end = strpos ( $datastr, $numstrdiv, $ptr );
						$nextptr = ($end+1);
					}
					if ($types[($i+1)] == "n") { 
						$end = strpos ( $datastr, $numdiv, $ptr );
						$nextptr = $end;
					}
					$var = "";
					for ($j=0; $j < ($end-$ptr); $j++) { $var .= substr($datastr, ($ptr+$j), 1); }
					$ptr = $nextptr;
				}
			}
			if ($types[$i] == "c" OR $types[$i] == "n") {  /* clean the data of questionable or disallowed characters */  }
			$vars[$names[$i]] =  $var;
			$i++; /* this will move to the next array variable to get */
		}
		$update = date("Y-m-d H:i:s", time());  /* set the update value for those tables that use last update dates */
		
		switch ($seq) { /*  build insert statements  */ }
		
		/* have the insert statement constructed now push it to the table and wipe memory clean for the next record. */
		//*
		$nConnID = opendb();
			mysql_query($str, $nConnID) or die("ERROR: ".mysql_error()."<br>SQL: $str<br>");
			if ($rows = mysql_affected_rows($nConnID) > 0) { $added = $added + $rows; }
		closedb($nConnID); 
		//*/
		
		unset($vars);
		unset($datastr);
		ob_flush();
		ob_clean();
	}
}
?>
aparker
Forum Newbie
Posts: 6
Joined: Wed May 10, 2006 8:08 am

Post by aparker »

pickle wrote:You can also increase the memory allocated to PHP even further - give it 100MB.
Hosting company will not allow that. They have it at the maximum they will allow now. Besides the data file that I have trouble with today is 22.5 MB, approximately 1100 bytes per row which turns into 21,974 rows in the database. I can increase memory to handle that, possibly, but then tomorrow they send me a file that is 28 or 30 megs. What then? This is a daily load and is only one of 18 files that we load. There has to be either something I am doing wrong or some technique in PHP that I can find to stop the growth of memory usage with each pass. My feeling is that increasing memory allocation is only a temporary patch to a bigger issue.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

First, have you taken a look at MySQL's "LOAD DATA INFILE"? There is no faster way I know of to get data into MySQL and can handle all kinds of delimiters.

Second, generally when reading from large files you read line-by-line or chunk-by-chunk so you do not have to worry about any larger buffer than you max line or chunk size.
(#10850)
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

Are you freeing the result resource identifier? I think the function is mysql_free_result.
aparker
Forum Newbie
Posts: 6
Joined: Wed May 10, 2006 8:08 am

Post by aparker »

Everah wrote:Are you freeing the result resource identifier? I think the function is mysql_free_result.
There is no effect using mysql_free_result because mysql_quer returns a "resource" only for SELECT, SHOW, EXPLAIN, and DESCRIBE queries. I am doing only INSERT queries. I di have to read the manual on it, because I was looking for "FREE" and that popped up in the search.
aparker
Forum Newbie
Posts: 6
Joined: Wed May 10, 2006 8:08 am

Post by aparker »

arborint wrote:First, have you taken a look at MySQL's "LOAD DATA INFILE"? There is no faster way I know of to get data into MySQL and can handle all kinds of delimiters.

Second, generally when reading from large files you read line-by-line or chunk-by-chunk so you do not have to worry about any larger buffer than you max line or chunk size.
From experience on this file, it doesn't seem to have any impact to read in chunks of 2048 or 8192 bytes. It still bombs at roughly the same point.

I have not specifically looked at LOAD DAT INFILE, but I did try using the mysqlimport from the command line. It bombed a lot sooner than the PHP script seems to. If I am not mistaken, I think that LOAD DATA INFILE formats and then makes a call to mysqlimport, so if the base function bombs, the higher would probably follow suit.

The issue is not with the file size or record size, but more with the fact that every pass through the iterative loop, I use more memory and can't seem to find a way to RELEASE that memory. Drawing from my 'C' experience, I would like to "alloc"ate some memory for the record and build a structure of the parsed data and then insert the structure into the database before "free"ing the memory. PHP doesn't seem to ahve anything that will do that, unless someone can point me to swomething I haven't found or looked at correctly yet.

I truly hope no one takes my comments the wrong way, but I have been beating this horse for over a week and tried many options only to reach the same point of memory failure within plus or minus 500 records. You don't know how much I appreciate any and ALL inputs!!!
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

From looking at the code you post, the only thing I would do is to not do an opendb() / closedb() every loop. That may be where the memory leak is, plus there is a lot of overhead in opening connections. Just do one opendb() at the very top and one closedb() at the very end.
(#10850)
aparker
Forum Newbie
Posts: 6
Joined: Wed May 10, 2006 8:08 am

IT'S ALIVE! It works!

Post by aparker »

It's definitely not the prettiest solution, but the file is loading and it is automated successfully! I first FTP all the files and then check the sizes. Anything over a predetermined limit gets broken into multiple files (this particularly troublesome one got split into 5 files of 5000 records each and one of about 280 records.

Then the process kicks off the loader which is the same loader for all the files. It knows which file to process by a sequence-version combination number passed to it for the next iteration. If the main file is still in tact, no versioning so the loader processes the main file. If the main file gets split, then the version is added to the sequence and each "little" file gets loaded in order and the main file is deleted from the server.

I tried to fork processes after each file completed at the split, but pcntl_fork() would not work, so I abandoned the thought. As I said, "it ain't purty, but she works"!

In the aftermath, however, it sure seems rediculous to have to go through this kind of juvenile coding exercise just because PHP can't handle the memory management thing. Wish there was a way to allocate and release that memory!

Thanks for ALL the suggestions and help! Each offering forced me to stop and think through it again, and Now to try to find out how to mark this one resolved.

Cheers!
:lol:
Post Reply