Page 1 of 1

Finding Memory Leaks

Posted: Mon Sep 17, 2007 5:23 pm
by pepa
Hi Forum,

I've written an import script for a bigger image database. The script iterates over 20000 database entries and as much
lines of csv file, creates some object and updates all database entries. All in one big for-loop.

My problem is that every loop takes about 20k extra memory eating up all memory I feed it.

I tried to debug it using "get_defined_vars()" but it returns every time the same amount of variables. (I think this is because
get_defined_vars only returns accessible variables.)

Now I'm a little out of options. What can I do to find the root of the problem? Is there a way to output _every_ memory eating bastard
or to access the symbol-tables directly and look for stuff which shouldn't be there? Or are there any php memory debuggers which could
help me?

Help would be very much appreciated.

Best regards

PePa

(PS: Please forgive me if this is the wrong forum but I think the question goes well deep.)

Posted: Mon Sep 17, 2007 7:16 pm
by ReDucTor
I dont know if theres such tools out there designed specifically for PHP, but a useful thing might be to show your code. We might spot something which could be causing it.

Posted: Mon Sep 17, 2007 8:00 pm
by pepa
Hi ReDucTor.

Thanks for the offer but can't do so. The import code is pretty simple, but the framework it is using isn't. It would involve quiet some kilo lines of code and roughly 10 classes. But perhaps to illustrate the problem some pseudo-code: (The application is an online image database: http://www.altrofoto.de so the objects are images)

Code: Select all

open file.csv
for( $line of file.csv ){

// Image from csv
$i = new Image();
$i->readFrom( $line );

// Image from db
$idb = (Image) select image from imagedb where $i.id = id

case( compare $i $idb ){

changed: update $i in database
new: insert $i in database

}
do some more db things to connect images with other stuff

throw away all stuff which scope ends here, of course.
}
"Image" is the big thing. Image is composed of many other classes. (References in all directions).
My guess was that some references didn't get released and so the garbage-collection (does php have a garbage-collection anyways?)
can't free the memory. But all tries to unset references have failed so far. The one thing that I need is some tool/trick/command which
will tell me what variables php know of and why it won't forget them...

Best regards

PePa

Posted: Mon Sep 17, 2007 8:16 pm
by Paw
Hi pepa,

incidentally I'm currently working on a large PHP-based framework as well, which - at one point - began leaking lots of memory. It took two days of heavy profiling and debugging, to solve the problem. There are a few things to look for, also in regards to your description.
  • Database query results should be freed as soon as they are not needed anymore; especially in a loop with many iterations.
  • Objects with circular references leak memory, due to a PHP bug (http://bugs.php.net/bug.php?id=33595). If you have a parent object containing a child object which has a back-reference to its parent, both objects cannot be deleted by PHP's garbage collector. To solve this problem, you can write a destructor method for your child class, which unsets the parent reference. Make explicit use of unset where you temporarily create such ring-reference objects. Leaving the scope will litter memory, this also applies to loops:

    Code: Select all

    while(condition) $a = new RingRefObject;
  • create_function litters the global scope with "permanent" functions. They are not discarded when the scope of invocation is left. So be careful in making use of this function.
Additionally there comes to my mind, that there might be some problems with your Image class, which presumably makes use of the GD library. Look out for some method that frees the created image resources.

Maybe there is some pointer for you, that helps in your situation.

Good luck!

Posted: Mon Sep 17, 2007 8:35 pm
by pepa
Hi Paw,

at last someone who knows what I'm talking about :)

To your post:

1. Are you talking about single rows I iterate over (then there could be a problem) or are you talking about the complete resultset (which i do free indeed)?
2. This is what I guessed, too. I spent quiet some time going over all circular references and unsetting them as well in the main object as in the sub objects to eliminate this possibility.
3. luckily some stuff I haven't used in my project. So this isn't IT.

What me really interests is: What did you do to find your leaks? Only thing I found is "memory_get_usage" to watch my ram fade away and "get_defined_vars" to tell me absolutely nothing ... What would be _very_ usefull was "show_whats_eating_ram" or "show_all_known_vars" but someone forget putting it in the documentation ... :)

Regards

PePa

Posted: Mon Sep 17, 2007 9:04 pm
by Paw
pepa wrote:1. Are you talking about single rows I iterate over (then there could be a problem) or are you talking about the complete resultset (which i do free indeed)?
I meant the whole result set. In our framework, we've got a database result wrapper which automatically frees the result on object destruction (if not already explicitly done by method invocation). Since these result objects have references to database connection objects, they are normally freed at script termination, so actually too late when many queries have been done.
PHP seems to prefer to do its major cleanups and __destruct-calls in the end. However, if no circular references are given, it can also be forced during script execution by using unset. That's what I could see during analysing the problem.
pepa wrote: 2. This is what I guessed, too. I spent quiet some time going over all circular references and unsetting them as well in the main object as in the sub objects to eliminate this possibility.
<snip>
What me really interests is: What did you do to find your leaks? Only thing I found is "memory_get_usage" to watch my ram fade away and "get_defined_vars" to tell me absolutely nothing ... What would be _very_ usefull was "show_whats_eating_ram" or "show_all_known_vars" but someone forget putting it in the documentation ... :)
Besides crying and cursing :D
At some point I was quite certain of the circular reference problem to be a main cause of the memory leaks. So I tested the suspect classes by doing something like

Code: Select all

function __destruct() {
    echo get_class($this),' just passed away<br>';
}
And looked at which points these messages appeared. That's actually pointless if your code doesn't generate other output, but that wasn't the case. In your case of a loop, you could fill in some debug output info into the loop's body.

In our case, these messages appeared in the end of script execution. But some redesigns and explicit resource freeing/unset calls could solve the leaking problem for most cases. However, a database data migration script which transforms thousands of various records using a quite complex ActiveRecord class, still steadily consumes more and more memory during iteration, even though the "scripted" or intended object count stays the same. But it appears to be neglectable since normal website usage of the framework is stable now.

If your framework does not show memory-leaking behaviour in context of a "normal web application", you could just break up your data processing script so that it processes a fewer amount of images per run.


EDIT: We had many memory-wasting problems in our framework. One of them has been an XmlObject class, which builds tree structures. In order to free the occupied memory during run-time, I had to write an explicit method that recursively unsets all references on destruction. So trees and other linked data structures are also something to look for.

And to answer your question about dedicated profiling tools, I'm afraid, I've no experience with such for myself. But as far as I know, these exist. Quick google search: http://xdebug.org/

Posted: Mon Sep 17, 2007 9:53 pm
by Christopher
Have you looked into SQL statements like MySQL's "LOAD DATA INFILE" to do the import for you? They are fairly powerful and blindingly fast.

Posted: Mon Sep 17, 2007 11:26 pm
by s.dot
You could unset() each line after you insert it (or whatever you're doing).

I imported a csv file line by line of canadian + us zip codes with close to a million records and I didn't have a memory problem.

Just annoying long script time.

Posted: Tue Sep 18, 2007 6:10 am
by pepa
@Paw:
Ah. Good ol' crying and cursing. Tried that too. Didn't work either :)
But explicitly monitoring the destruction is something I didn't do. I will give it a try and report back if it helped.
But before that I now have a reason to upgrade to php5. Hope it won't create too many new problems ...

@arborint:
would do that but the import has to do many other things to (convert explizit entries to forein key references, convert units, ...)

@scottayy:
my guess is it's not the (my)sql calls but the OR-Mapper which causes the problem. One solution would be not to use it
(instead of using "new Image(); $i->readFromDb()" and "$i->writeToDb()" I could use only the mysql
result and a hand crafted insert query. But the main reason of using ORM is not to have to ...
Ah well and yes, I tried unsetting my Image object after "$i->writeToDb()" but with no luck.

Regards,

PePa

Posted: Tue Sep 18, 2007 7:25 pm
by pepa
So. Tried everything. Upgraded to 5. Unsetted everything. Even single strings. Freed db results. Installed xdebug and got it running. Even tried crying again. Nothing.

There has to be a way to tell php to tell me where it's wasting memory. My next try would be reimplement the complete import in another programming language which doesn't has that kind of problem.

Regards,

PePa