Tracking memory usage
Moderator: General Moderators
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Tracking memory usage
Developers often spend time ironing out slow sections of applications, and profiling is a good way of finding out when problems like this crop up. However, a lot less attention is paid to the memory usage of an application.
In my case, I'm attempting to tune an algorithm that filters HTML (HTML Purifier). Anecdotal reports say that HTML Purifier can use up to a hundred times the amount of memory of the input data. This is simply unacceptable. A lot of it has to do with the overhead of PHP and HTML Purifier, but there may be some lengthy strings that aren't being properly cleaned up by PHP's built-in garbage collection.
Besides performing a complete code audit, is there a utility similar to a profiler that will let me figure out where in the execution the maximum memory is reached?
In my case, I'm attempting to tune an algorithm that filters HTML (HTML Purifier). Anecdotal reports say that HTML Purifier can use up to a hundred times the amount of memory of the input data. This is simply unacceptable. A lot of it has to do with the overhead of PHP and HTML Purifier, but there may be some lengthy strings that aren't being properly cleaned up by PHP's built-in garbage collection.
Besides performing a complete code audit, is there a utility similar to a profiler that will let me figure out where in the execution the maximum memory is reached?
Wow. I ran the following program:
The result of which for me was 130! Holy cow, that's a lot o files. Is that right?
Peak memory usage for the above program was 3.2MB.
I didn't spend any time on this, so I might have screwed something up. I just wanted to see what the traces looked like in XDebug 2.
Code: Select all
require_once 'HTMLPurifier.php';
$purifier = new HTMLPurifier();
$result = $purifier->purify("");
echo count(get_included_files());Peak memory usage for the above program was 3.2MB.
I didn't spend any time on this, so I might have screwed something up. I just wanted to see what the traces looked like in XDebug 2.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
No, because I am attempting to measure memory, not time.
Spiffy image:

Bottom axis is execution time, left axis is memory usage in bytes. Input document size was 65.6 KB, which is large, but not outrageously slow. Timing is not representative due to tracing. Peak memory usage is ~6.3 MB. The initial 4 MB is simply overhead from HTML Purifier's extensive OOP architecture: there's not much I can do about that. The next 2 MB are from the tokenized representation of the HTML.
Starting at 3 sec, our regular strategies, which are the real workhorses of the application, kick in, and the level of memory stays constant, until 8.6 sec, when the HTML is to be generated and, somehow, the memory usage is a lot smaller (I suspect it's because I don't have parallel copies of the arrays running, even though PHP's quite good about re-allocating memory only when it absolutely needs to). Once that finishes, memory drops to pre-parsing levels of 4 MB, the library's overhead.
This is quite sobering, because it means that the token format that represents the life-blood of this application is extremely memory hungry. DOMDocument->loadHTML, on the other hand, miraculously adds only 1 KB to the application's footprint, which makes me think something fishy is going on: it isn't being caught by the tracer until PHP allocates memory for it locally. Which effectively makes XDebug useless.
ARGH!
Spiffy image:

Bottom axis is execution time, left axis is memory usage in bytes. Input document size was 65.6 KB, which is large, but not outrageously slow. Timing is not representative due to tracing. Peak memory usage is ~6.3 MB. The initial 4 MB is simply overhead from HTML Purifier's extensive OOP architecture: there's not much I can do about that. The next 2 MB are from the tokenized representation of the HTML.
Starting at 3 sec, our regular strategies, which are the real workhorses of the application, kick in, and the level of memory stays constant, until 8.6 sec, when the HTML is to be generated and, somehow, the memory usage is a lot smaller (I suspect it's because I don't have parallel copies of the arrays running, even though PHP's quite good about re-allocating memory only when it absolutely needs to). Once that finishes, memory drops to pre-parsing levels of 4 MB, the library's overhead.
This is quite sobering, because it means that the token format that represents the life-blood of this application is extremely memory hungry. DOMDocument->loadHTML, on the other hand, miraculously adds only 1 KB to the application's footprint, which makes me think something fishy is going on: it isn't being caught by the tracer until PHP allocates memory for it locally. Which effectively makes XDebug useless.
ARGH!
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Googled: are you referring to this: http://support.microsoft.com/kb/q94209/ ?
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- kyberfabrikken
- Forum Commoner
- Posts: 84
- Joined: Tue Jul 20, 2004 10:27 am
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US