Page 1 of 1
Memory usage reduced with short attribute names ?!?
Posted: Wed Oct 14, 2009 3:47 am
by Mark Baker
Can anybody explain why memory usage seems better when using shorter attribute names?
I have the following test class:
Code: Select all
$loopCount = 1024 * 512;
error_reporting(E_ALL);
class testClass
{
public $veryLongAttributeName0 = NULL;
public $veryLongAttributeName1 = NULL;
public $veryLongAttributeName2 = NULL;
public $veryLongAttributeName3 = NULL;
public $veryLongAttributeName4 = NULL;
public $veryLongAttributeName5 = NULL;
public $veryLongAttributeName6 = NULL;
public $veryLongAttributeName7 = NULL;
public $veryLongAttributeName8 = NULL;
public $veryLongAttributeName9 = NULL;
function __construct($libraryType = null, $fileID = null)
{
}
function _destructor()
{
} // function destructor
} // class testClass
$callStartTime = microtime(true);
$testArray = array();
for ($i = 0; $i < $loopCount; ++$i) {
$testArray[] = new testClass();
}
$callEndTime = microtime(true);
$callTime = $callEndTime - $callStartTime;
echo '<br />Call time to instantiate '.$loopCount.' objects of testClass was '.sprintf('%.4f',$callTime)." seconds<br />\n";
echo date('H:i:s').' Peak memory usage: '.(memory_get_peak_usage(true) / 1024 / 1024).' MB<br />';
which produces the following result:
Code: Select all
Call time to instantiate 524288 objects of testClass was 3.4530 seconds
09:34:08 Peak memory usage: 494.75 MB
If I simply change the attribute names
Code: Select all
public $veryLongAttributeName0 = NULL;
public $veryLongAttributeName1 = NULL;
public $veryLongAttributeName2 = NULL;
public $veryLongAttributeName3 = NULL;
public $veryLongAttributeName4 = NULL;
public $veryLongAttributeName5 = NULL;
public $veryLongAttributeName6 = NULL;
public $veryLongAttributeName7 = NULL;
public $veryLongAttributeName8 = NULL;
public $veryLongAttributeName9 = NULL;
to
Code: Select all
public $v0 = NULL;
public $v1 = NULL;
public $v2 = NULL;
public $v3 = NULL;
public $v4 = NULL;
public $v5 = NULL;
public $v6 = NULL;
public $v7 = NULL;
public $v8 = NULL;
public $v9 = NULL;
I get the following result:
Code: Select all
Call time to instantiate 524288 objects of testClass was 3.2260 seconds
09:37:46 Peak memory usage: 374.75 MB
It appears to run fractionally faster (although that's harder to determine), but uses significantly less memory (494.75 MB reduced to 374.75 MB)
That shouldn't be right... should it? Even in a semi-compiled language such as PHP, the memory usage (and possibly speed of execution) shouldn't be affected by the length of an attribute name.
Re: Memory usage reduced with short attribute names ?!?
Posted: Wed Oct 14, 2009 2:02 pm
by PHPHorizons
It makes sense to me that longer variable names means longer execution times.
How many times did you run the test?
It seems to me that running it a few hundred times is the only way to get conclusive results. If you only ran it one, the result is anecdotal.
Re: Memory usage reduced with short attribute names ?!?
Posted: Wed Oct 14, 2009 2:23 pm
by pickle
Without knowing a whole lot about how the PHP interpreter works, I think I know why it takes more memory. When you instantiate the object, it also sets it's properties. Obviously it will take more memory to store the string "veryLongAttributeName0" than "v0". My initial thought was that behind-the-scenes, PHP would convert both of them to internal memory pointers, and it probably does. However, it's possible the interpreter doesn't release the memory it initially used to load "veryLongAttributeName0", until after the script is completed. That would explain the memory usage. As for the time, it probably takes longer for PHP to interpret "veryLongAttributeName0" than "v0".
Re: Memory usage reduced with short attribute names ?!?
Posted: Wed Oct 14, 2009 3:27 pm
by PHPHorizons
If property names are not stored in a hash table, it would make callback functions very difficult to achieve.
Code: Select all
array_map(array($this, 'veryLongPropertyName'), $some_array);
(That would be valid with lambda functions

)
It also shows that method names must have their actual names stored in a hash table as well. Unless of course every string instance of a method/property is converted. But that would probably be impossible to do.
Re: Memory usage reduced with short attribute names ?!?
Posted: Wed Oct 14, 2009 4:29 pm
by Mark Baker
I've run the tests a good few times now, using different versions of PHP and different operating platforms.... the memory differences are a pretty constant 120MB on Windows, and 100MB on Linux. And other people have replicated those results, so (while not definitive) it's a little more than simply anecdotal.
Thinking about this, I'm beginning to understand the reasoning behind it.
The bytecode PHP still needs to know the actual variable/attribute and function/method names for use in error handling and serialize() (among others): lambda functions are another good example.
I was assuming that for OOP, it would handle this slightly differently, and maintain a "class definition" with all the long name details, and each instance would just contain pointers to this name map; so that each instance would use a minimal amount of memory, and functions such as serialize would cross reference the data and pointers from the instance with the class name map to generate their output.
However, that could only work if all attributes were predefined in the class definition.... but PHP's loose coding rules allow you to define new attributes or even methods dynamically within a script.... against a specific instance of the class, so these couldn't exist in a "class map" in advance. Therefore, PHP takes the quick and dirty approach of holding the names within each instance.
PHPExcel has an instantiated object for every cell in every worksheet in a workbook. With large Excel files, that can easily hit several million instantiated cell objects.... and yes, we do hit memory problems problems with large files that we've been working hard to alleviate.
We're already looking at a form of cacheing so that cell instances are only memory resident when they're actually needed.
That would reduce the problem, although it does have speed implications that we need to look at as well. If we can get cell cacheing working with minimal overhead, then it's less of an issue.... we'd be able to work with one instance of the cell object in memory at any given time, swapping attribute values in and out as necessary.
Re: Memory usage reduced with short attribute names ?!?
Posted: Thu Oct 15, 2009 8:26 am
by Mark Baker
Thanks to everybody that has provided help and advice on this problem. Based on the suggestions that have been given, I've come up with the following code using magic getters/setters:
Code: Select all
class testClass
{
private static $_propertyList = array( 'longVariableName0',
'longVariableName1',
'longVariableName2',
'longVariableName3',
'longVariableName4',
'longVariableName5',
'longVariableName6',
'longVariableName7',
'longVariableName8',
'longVariableName9'
);
private $_data = array();
public function __set($name, $value) {
$key = array_search($name,self::_propertyList);
if ($key !== false) {
$this->_data[$key] = $value;
}
}
public function __get($name) {
$key = array_search($name,self::_propertyList);
if ($key !== false) {
return $this->_data[$key];
}
}
function __construct()
{
}
function _destructor()
{
} // function destructor
} // class testClass
Compared with my original script (running on the same server)
Original script with long property/attribute names:
Call time to instantiate 524288 objects of testClass was 3.1759 seconds
09:48:34 Peak memory usage: 494.75 MB
Using magic getters/setters, which allows us to retain long property/attribute names
Call time to instantiate 524288 objects of testClass was 1.8602 seconds
09:48:05 Peak memory usage: 150.75 MB
That's an incredible gain, and we're really grateful to everybody on PHP Developers Network and other forums who has helped us explain the cause of the problem, and provided us with a solution that not only gives us the ability to handle significantly larger volumes of data, but to do so with improved speed as well.
We can apply this technique to many of the classes within the library, which should allow us to handle workbooks up to 3 times the size that we can now, without any changes being required by developers who are using the library.
Re: Memory usage reduced with short attribute names ?!?
Posted: Thu Oct 15, 2009 10:39 am
by Eran
That's pretty incredible. Who would have thought magical methods could actually improve performance.
Re: Memory usage reduced with short attribute names ?!?
Posted: Thu Oct 15, 2009 11:19 am
by Mark Baker
pytrin wrote:That's pretty incredible. Who would have thought magical methods could actually improve performance.
It's not going to work in every case.... it all comes down to the number of attributes and the length of their names, and the number of instances of the object that are being created.
The magic getters/setter methods do add time overhead against every access to read/write/test an attribute; but offset against a smaller memory footprint (with fewer calls to malloc when instantiating). The
potential is their for improving performance, but it'll take a bit more effort using "real world" classes rather than my simplistic test class, and potentially a lot of additional code streamlining to gain real benefits.
Re: Memory usage reduced with short attribute names ?!?
Posted: Thu Oct 15, 2009 11:39 am
by onion2k
It's nice to know, but in the real world if your script is instantiating half a million objects you should probably be rethinking your approach anyway.
Re: Memory usage reduced with short attribute names ?!?
Posted: Thu Oct 15, 2009 12:04 pm
by Christopher
pytrin wrote:That's pretty incredible. Who would have thought magical methods could actually improve performance.
This is really interesting.
However, it is not actually clear that it would improve performance. If you look at what was done, Mark shortened 10 property names by 21 characters each and then instantiated 524288 objects. If you calculate 10 * 21 * 524288 you get about 110Mb which is about the difference it the original memory usage numbers. The times are for instantiation, not execution. It does not say whether code will run faster or slower once instantiated. I recall that magic methods are slower than properties.
The question is whether instantiating 524288 objects is a useful real world test? If you reduce the number of objects and increase the number of calls to setters/getters then instantiation time may become a small percentage of execution time.
Note that in his second example if he change
_propertyList and
_data to
_p and
_d he would save 7Mb of memory
Re: Memory usage reduced with short attribute names ?!?
Posted: Thu Oct 15, 2009 3:14 pm
by Mark Baker
arborint wrote:pytrin wrote:That's pretty incredible. Who would have thought magical methods could actually improve performance.
This is really interesting.
At the moment it's academic. Producing those results against my simple test class (with its excessively long attribute names) isn't the same as real world code. But the difference in the test class was enough to persuade me that it was an avenue worth exploring as a potential solution to a real world problem.
arborint wrote:However, it is not actually clear that it would improve performance. If you look at what was done, Mark shortened 10 property names by 21 characters each and then instantiated 524288 objects. If you calculate 10 * 21 * 524288 you get about 110Mb which is about the difference it the original memory usage numbers. The times are for instantiation, not execution. It does not say whether code will run faster or slower once instantiated. I recall that magic methods are slower than properties.
In the real world, it isn't so cut and dried. The magic setter/getter methods do increase the code size, and the memory benefits of reducing the property names may not be sufficient to offset this additional code size.
You're right, times are for instantiation, because I was simply testing the memory footprint at this point - the initial issue was all about the difference in memory usage between long and short names - because we had been hitting memory problems.
While speed is important, our current problem is memory usage when people try to work with very large workbooks. This solution may slow the code, but if it reduces the memory footprint by a significant amount, then that might be an overhead which can be justified.
arborint wrote:The question is whether instantiating 524288 objects is a useful real world test? If you reduce the number of objects and increase the number of calls to setters/getters then instantiation time may become a small percentage of execution time.
No it isn't a useful real world test, and that's why I'm running additional tests using the method with our real world code, looking at where it might benefit, and what the trade offs are.
It isn't always practical to reduce the number of objects... at least, not without scrapping the OOP approach completely; and the mechanism may not be appropriate to some of our classes.
arborint wrote:Note that in his second example if he change _propertyList and _data to _p and _d he would save 7Mb of memory
_propertyList is static, and it would seem from my experimentation that statics are maintained within the global namespace, so only a single copy exists, no matter how many instances there are of the object in which that static property is defined. Reducing _data to _d (or even d) would reduce memory usage still further.
Re: Memory usage reduced with short attribute names ?!?
Posted: Fri Oct 16, 2009 8:22 pm
by josh
What happens if you set a memory limit right below the threshold of what you expect it to use, does lengthening your variable names set it off or does PHP detect the limit getting closer and free up old un-used memory?
Re: Memory usage reduced with short attribute names ?!?
Posted: Tue Oct 20, 2009 8:34 am
by BDKR
arborint wrote:pytrin wrote:
The question is whether instantiating 524288 objects is a useful real world test? If you reduce the number of objects and increase the number of calls to setters/getters then instantiation time may become a small percentage of execution time.
Isn't this the kind of circumstance where the Flyweight pattern could be of use?
The link below does a good job of explaining.
http://www.javacamp.org/designPattern/flyweight.html