Page 1 of 1

Error: "couldn't fetch mysqli ..."

Posted: Wed Jun 20, 2007 6:27 pm
by smac
Hi,
I'm writing a PHP program that parses a *lot* of log files into a database. It's completely O-O, and each object opens a database connection to mysql using $db = new mysqli(...).

Here's my problem - the program works great for a limited number of log files, but if I run it over a large number of files, as it gets farther along in processing, I get warnings as shown below.. (although NOT for every query).

"Can't connect to MySQL server on 'localhost' (10048) in..."


However, mysqli is not returning any errors. The connect function seems to work OK, and "mysqli->error" is empty.

This happens for ~50 out of ~10,000 db operations, scattered throughout the dataset.

My working assumption is that somehow I have too many connections open at once, although the code is pretty good about closing them when it's done with them (the destructors in the objects all invoke mysqli->close() as the last thing they do.

I'm running:
Apache/2.0.47 (Win32) PHP/5.0.3 (Win XP)
with
MySQL 4.1
with no limit set on persistent links or number of connections.

Note that this whole process is single threaded (so I'm not doing multiple simultaneous operations).

Any help would be appreciated.

Thanks,
smac

Posted: Wed Jun 20, 2007 6:32 pm
by superdezign
Maybe if you didn't open a new connection everytime, it'd be okay. Have you tried doing it all on a single connection?

Posted: Wed Jun 20, 2007 8:58 pm
by AKA Panama Jack
superdezign wrote:Maybe if you didn't open a new connection everytime, it'd be okay. Have you tried doing it all on a single connection?
Really every time he opens a new connection that allocates more memory for the new connection object. He should be creating only ONE connection object and then execute the queries one after the other.

He is probably running out of memory on the database server.

Also, DO NOT USE PERSISTENT CONNECTIONS. They were needed on ancient versions of Mysql running on gawd alwful slow servers but they are not needed anymore. They tend to use far more memory and cause far more problems under Mysql/Mysqli. You are not gaining any speed or saving memory.

Posted: Wed Jun 20, 2007 9:10 pm
by Benjamin
Well he isn't using persistent connections considering it's mysqli, and I don't think it's a memory or too many connections issue, because it seems as if queries are randomly failing. Although it would seem that way if he keeps creating new connections. So without further information, my best guess is that some of the connections are timing out, being dropped/killed for some reason, or he isn't getting the correct error back because he is using the wrong connection ID when he tries to pull it.

Is it failing in the same place always? Different places? Can you post some code?

Posted: Wed Jun 20, 2007 9:46 pm
by Benjamin
Thinking about this some more, it sounds like you could be running large queries. You may be running into the max_allowed_packet size limit. You can increase the size of this in the my.cnf file.

Posted: Wed Jun 20, 2007 10:17 pm
by bdlang
Aside from what's been mentioned, if you haven't already, enable the slow query log. Edit your my.cnf file and add the lines below your [mysqld] section:

Code: Select all

long_query_time=1
log-slow-queries="slow-query-log.txt"
This will be in your 'data' directory, and log all queries that take longer than 1 second.

You can also specify a location, e.g. C:/tmp/mysql-slow-query-log.txt

Status so far . . .

Posted: Thu Jun 21, 2007 9:45 am
by smac
Thanks to all for the comments and suggestions so far.

astions: I'm pretty sure it fails in different places, although I'll verify that in the next day or so. Someone else suggested that I monitor what's going on on the MySQL end by running "show full processlist" repeatedly while the job is running. I'll do that also. I'll also extract some pertinent code and post it.

bdlang: Thanks for the tip on slow query logging. I'll turn it on. No individual query should be that large, but maybe it'll tell me something.

Some test results . . .

Posted: Fri Jun 22, 2007 11:39 pm
by smac
OK, I ran a few tests.

First a little more background - the program parses Apache log files and builds a web access database. The part we're talking about here reads each hit record from the log file, decides whether to log it (I throw away pretty much everything except page views) and allocates collections of hits into visits.
When I'm done with each hit object, I invoke a storeHit() method that saves the hit to a database table. Likewise for each visit.
Each hit and visit opens a database connection when it needs to access the db, runs a query, then closes the connection. At any given time, there may be several visits open (new visits can start before old ones finish). There should only be one hit open at a time.

The hit query is simply an INSERT or an UPDATE that stores a single database record of 22 fields.
The visit query is likewise an INSERT or an UPDATE that stores a single record of 18 fields.

Now, about the tests:

- The errors occur at different places in the input data stream. We're talking about anywhere between 10 and 40 errors out of 14,600 records processed.
- There's nothing in the slow query log.
- I'm routing all db connects and close's through a pair of functions (below). I added some code (removed here for clarity) to log opens, closes, and errors. There are never more than a few connections open when the errors occur. All connections are closed (explicitly, using db_close() below) by the time the program finishes.
- the mysqli_connect_errno() function does successfully detect failures to connect. I've been using it to log when errors occur.
- I tried sleep'ing after processing each log file. sleep(20) reduces the errors, sleep(30) eliminates them (repeatably).
- There's an awful lot of code. As a sample, here's the storeHit() method from the hit class. Note that the connection is opened when we start and closed when we finish. I even unset() the mysqli object to try to force it to be released.

All connects and closes are routed throught the two functions below: db_connect() and db_close().

Code: Select all

function db_connect() {
	$db = new mysqli($hostname, $username, $password, $database);
	return $db;
 }
 
/** db_close - close the connection and return a status.
*	If the handle was invalid, db_close returns true.
*
*	@param	handle	db		Handle to a mysqli object.
*	@return	boolean	status	TRUE if close was successful.  False otherwise.
*/
 function db_close($db) {
	if($db) {
	 	$status = $db->close();
	} else {
		$status = true;
	}
	return $status;
 }

Below is the storeHit() method from the Hit class. The storeVisit() method for the Visit class is very similar.

Code: Select all

/** storeHit - Save the hit to the database.
 *  If the hit has already been stored, storeHit updates the record.  Otherwise it does an
 *  INSERT.  If the store was successful, storeHit returns the hitID (generated by the database
 *  upon the first save).
 *  If the store fails, storeHit creates a Logger object and logs the error.
 * 
 *  @param	none.
 *  @return	integer	hitID	If successful.
 *  @return boolean FALSE	If failure.
 * 
 */
	public function storeHit() {

		$this->dbHandle = db_connect();

		$safeQueryString = addslashes($this->queryString);
		$safeReferer = addslashes($this->referer);


		// If the hit is already stored, then do an UPDATE.  Otherwise, an INSERT.		
		if($this->hitIsStored) {
			$query = "update hit set " .
					" hitDate 		= '{$this->hitDate}'," .
					" serverName	= '{$this->serverName}'," .
					" localIP		= '{$this->localIP}'," .
					" requestMethod = '{$this->requestMethod}'," .
					" userID		= '{$this->userID}'," .
					" urlPath		= '{$this->urlPath}'," .
					" queryString	= '$safeQueryString'," .
					" remoteUser	= '{$this->remoteUser}'," .
					" remoteIP		= '{$this->remoteIP}'," .
					" userAgent		= '{$this->userAgent}'," .
					" cookieString	= '{$this->cookieString}'," .
					" userCookie	= '{$this->userCookie}'," .
					" sessionCookie = '{$this->sessionCookie}'," .
					" referer		= '$safeReferer'," .
					" host			= '{$this->host}'," .
					" status		= '{$this->status}'," .
					" bytesSent		= '{$this->bytesSent}'," .
					" timeToServe	= '{$this->timeToServe}'," .
					" timeStamp		= '{$this->timestamp}'," .
					" visitID		= '{$this->visitID}'," .
					" siteID		= '{$this->siteID}'";
		} else {
			$query = "insert into hit (" .
 	 				" hitDate, serverName, localIP, requestMethod, userID, urlPath, queryString, " .
 	 				" remoteUser, remoteIP, userAgent, cookieString, userCookie, sessionCookie, " .
 	 				" referer, host, status, bytesSent, timeToServe, timestamp,	visitID, siteID) ";

 		 	$query .= "values ( " .
					"'{$this->hitDate}',		'{$this->serverName}',	'{$this->localIP}',		'{$this->requestMethod}'," .
					"'{$this->userID}',			'{$this->urlPath}',		'{$safeQueryString}', '{$this->remoteUser}'," .
					"'{$this->remoteIP}',		'{$this->userAgent}',	'{$this->cookieString}','{$this->userCookie}'," .
					"'{$this->sessionCookie}',	'{$safeReferer}',		'{$this->host}',		'{$this->status}'," .
					"'{$this->bytesSent}',		'{$this->timeToServe}',	'{$this->timestamp}',	'{$this->visitID}'," .
					"'{$this->siteID}'" .
					")"; 
		}

		$result = $this->dbHandle->query($query);

		if (!$result) {
			include_once('lib/Logger.php');
			$logger = new Logger("WebLogDBErrorLog.log");
			$err = $this->dbHandle->error;
			$this->error =  "Hit->store() Database error executing |$query|\n" .
	  						"              Error was \"$err\"\n";
			$this->error .= "Hit details\n";
			$this->error .= $this->toString() . "\n";
			$logger->logerror($this->error);
			$returnStatus = false;

		} else {
			$this->hitID = $this->dbHandle->insert_id;		// It was a new record.  Get the key.
			$this->hitIsStored = true;
			$returnStatus = $this->hitID;
		}

		// Close the database connection and release the mysqli object before leaving.
		if(db_close($this->dbHandle)) {
			unset($this->dbHandle);
		}
		return $returnStatus;
	} // End of storeHit.
I'd appreciate any further suggestions!

Thanks for listening,
smac

Posted: Sat Jun 23, 2007 12:02 am
by Benjamin
I stopped reading at the line that said:

"Each hit and visit opens a database connection when it needs to access the db"

What I would recommend you do is have ONE connection and run the queries in batches of 500 or so.

Something like..

Code: Select all

class blah
{
    private $counter = 0;
    private $query     = '';

    function add_record($data)
    {
        $this->query .= "INSERT INTO blah (field) VALUES ('" . $this->db->esc($data) . "');";
        $counter++;
        if ($counter >= 500) $this->process();
    }

    function process()
    {
        $this->db->query($this->query);
        $this->db->cycleResults();
        $this->query = '';
        $this->counter = 0;
    }
}
That's just psuedo code to give you a rough idea. You'll need to use mysqli_multi_query in order for it to work and cycle the results after each batch. Don't reconnect on each insert, that's just crazy.

EDIT: I didn't realize you were creating a new database connection for every single query. I figured you had a few classes here and there that were establishing connections. Like the others have said, you only need ONE connection.

Posted: Sat Jun 23, 2007 2:15 am
by bdlang
smac wrote:First a little more background - the program parses Apache log files and builds a web access database.
I don't know what sort of ability you might have to implement this, but have you looked into mod_log_mysql?? I've used it and it's perfect for what you're trying to accomplish.

I guess its refactoring time

Posted: Sat Jun 23, 2007 4:43 pm
by smac
Well, it seems pretty clear that it's all the connect's that are killing me. I guess I have some refactoring to do.

When I store the HIT, MySQL creates a unique ID (hitID) using an autoincrement field. I use the hitID in the VISIT record (I store the ID of the first hit of the visit with each VISIT record.)

I also store the visitID (generated the same way) with each hit record, so that later I can pull all the hits belonging to that visit. Obviously this means that I need to have the ID's available when I save the records - which is why I was saving them one-at-a-time. I'll guess I'll have to do it differently.

astions: Thanks for the "blah" class. I like the idea. I'll have to either make it global (yuk!) or refactor the log parser to extract the query information from each HIT and VISIT and pass it to "blah".

bdlang: Thanks for the suggestion. I did look briefly at mod_log_mysql. It might be useful for storing the HITs in a temporary table for later processing into visits and other summary information, but I'll still have a lot of processing to do. I identify robot visits, top query strings, top referers, form posts, summary counts by day, etc... Because I have to do all that other stuff anyway, I didn't really see any advantage to storing the HITs separately, directly from Apache.
I also throw away a lot of hits (images, javascript, css, for example). I use them to gather some intelligence when I'm forming the VISIT record, then ditch them. For example - robots seldom ever fetch css or javascript files, so a complete absence of them in a visit is a good indicator that the visitor was a robot, whether they owned up to it (in the UserAgent string) or not. For those reasons, I rejected using mod_log_mysql.

Thanks for the help guys! I'll let you know how it works out.

smac