Page 1 of 2

Performance Implications of count()

Posted: Wed Aug 02, 2006 4:13 pm
by Ollie Saunders
So continuing my series of "Performance Implications of" this time we look to count().

The psychology of programming comes into play here. When I first started programming in PHP I used to use sizeof() instead of count(). Partly this was because I was programming in C++ mainly and the other reason was because count() sounds like an expensive operation where sizeof() less so. Of course any PHP dev worth their salt will know that sizeof() is just an alias of count() and are besides the name exactly the same. But one question always lumbered in my mind does count() count? Or does PHP monitor the size of the array as it is operated upon and count() is simply used to return that internal data? I was pretty sure that the latter was the case, lets see if I am right.

Here's the code:

Code: Select all

function microtime_float()
{
    list($usec, $sec) = explode(' ', microtime());
    return (float)($usec + $sec);
}

$array = array(1,5,6,3,7,8,3,6,7,3,1,5,7,8,4,2,5,1............ // it continues for a while
echo 'Array is ' . count($array) . " elements in length.\n\n\n";

/**
 * *****************************************
 */
echo "COUNT()\n";
$time = array();

for ($i=0, $j=10; $i<$j; $i++) {
    $k = 0;
    $time['start'] = microtime_float();
    for (;$k<50000; $k++) {
        count($array);
    }
    $time['end'] = microtime_float();
    $time['length'][] = $time['end'] - $time['start'];
}

print_r($time['length']);
flush();
echo "\n\n\n";

/**
 * *****************************************
 */
echo "SIZEOF()\n";
$time = array();

for ($i=0; $i<$j; $i++) {
    $k = 0;
    $time['start'] = microtime_float();
    for (;$k<50000; $k++) {
        sizeof($array);
    }
    $time['end'] = microtime_float();
    $time['length'][] = $time['end'] - $time['start'];
}
print_r($time['length']);
flush();
echo "\n\n\n";

/**
 * *****************************************
 */
echo "GETINTWITHARRAYPARAM()\n";
$time = array();
function getIntWithArrayParam($a)
{
    return 16;
}

for ($i=0; $i<$j; $i++) {
    $k = 0;
    $time['start'] = microtime_float();
    for (;$k<50000; $k++) {
        getIntWithArrayParam($array);
    }
    $time['end'] = microtime_float();
    $time['length'][] = $time['end'] - $time['start'];
}
print_r($time['length']);
flush();
echo "\n\n\n";

/**
 * *****************************************
 */
echo "GETINT()\n";
$time = array();
function getInt()
{
    return 16;
}

for ($i=0; $i<$j; $i++) {
    $k = 0;
    $time['start'] = microtime_float();
    for (;$k<50000; $k++) {
        getInt();
    }
    $time['end'] = microtime_float();
    $time['length'][] = $time['end'] - $time['start'];
}
print_r($time['length']);
flush();
This performs 4 different tests:
  1. Calling count() on a large array
  2. Calling sizeof() on a large array
  3. Calling of a user function that returns 16 with $array as the argument.
  4. Calling of a user function that returns 16 with no arguments
I predicted that 4 would be fastest followed by 1 then 2 and finally 3.

Here is the output:

Code: Select all

Array is 2700 elements in length.


COUNT()
Array
(
    [0] => 0.0332798957825
    [1] => 0.0332040786743
    [2] => 0.0331971645355
    [3] => 0.0338530540466
    [4] => 0.0339250564575
    [5] => 0.0339081287384
    [6] => 0.0338890552521
    [7] => 0.0338859558105
    [8] => 0.0338299274445
    [9] => 0.03391289711
)



SIZEOF()
Array
(
    [0] => 0.0334188938141
    [1] => 0.0333170890808
    [2] => 0.0333638191223
    [3] => 0.0333349704742
    [4] => 0.0333120822906
    [5] => 0.0340440273285
    [6] => 0.0340280532837
    [7] => 0.0342969894409
    [8] => 0.0340049266815
    [9] => 0.0340030193329
)



GETINTWITHARRAYPARAM()
Array
(
    [0] => 0.0572888851166
    [1] => 0.0563662052155
    [2] => 0.0563371181488
    [3] => 0.0576510429382
    [4] => 0.0576648712158
    [5] => 0.0571188926697
    [6] => 0.0576598644257
    [7] => 0.0571818351746
    [8] => 0.057657957077
    [9] => 0.0562329292297
)



GETINT()
Array
(
    [0] => 0.0364010334015
    [1] => 0.0363349914551
    [2] => 0.0363218784332
    [3] => 0.0364139080048
    [4] => 0.0369350910187
    [5] => 0.0371720790863
    [6] => 0.0372090339661
    [7] => 0.0372529029846
    [8] => 0.0372650623322
    [9] => 0.0362319946289
)
The results show that 50 000 calls to count() an array 2700 elements in length is faster than a call to a function that returns 16!
There is no significant difference between count() and sizeof().
Using an array as an argument is consideribly more expensive.

Hope you learnt something folks.

The reason I did this was because I am creating a class that implements ArrayAccess, Iterator and Countable interfaces and I wondered if I could improve performance of calls to count by caching the length of the array private to my class but the reality is had I done that it would have likely slowed the performance of my class overall

Posted: Wed Aug 02, 2006 5:05 pm
by Ambush Commander
While I'm against this sort of benchmarking because it can vary wildly depending on what system you use it on, there's a good reason why count() is so fast: PHP internally stores the number of elements of an array in the datastructure for the array, so count is just returning a pre-computed value.

Posted: Wed Aug 02, 2006 5:21 pm
by Ollie Saunders
Ambush Commander wrote:While I'm against this sort of benchmarking because it can vary wildly depending on what system you use it on, there's a good reason why count() is so fast: PHP internally stores the number of elements of an array in the datastructure for the array, so count is just returning a pre-computed value.
You should've read the post, mate xD

Posted: Wed Aug 02, 2006 5:23 pm
by Ambush Commander
That's two mistakes today... hmm... maybe I should take a break.

Posted: Wed Aug 02, 2006 5:29 pm
by AKA Panama Jack
Actually, count isn't returning a precomputed value. :) It has to compute the value every time it is executed.

Here is an example...

Code: Select all

function microtime_float() 
{ 
    list($usec, $sec) = explode(' ', microtime()); 
    return (float)($usec + $sec); 
} 

$array = array();
for($i=0; $i<50000; $i++)
{
	$array[] = mt_rand(1,$i);
}
echo 'Array is ' . count($array) . " elements in length.<br><br>"; 

/** 
 * ***************************************** 
 */ 
echo "COUNT() in loop<br>"; 

$timestart = microtime_float(); 
for ($i=0; $i<count($array); $i++) { 
    $k = 0; 
} 
$timeend = microtime_float(); 
$timelength = $timeend - $timestart; 

echo $timelength; 
flush(); 
echo "<br><br><br>";

/** 
 * ***************************************** 
 */ 
echo "COUNT() outside loop<br>"; 

$timestart = microtime_float(); 
$count = count($array);
for ($i=0; $i<$count; $i++) { 
    $k = 0; 
} 
$timeend = microtime_float(); 
$timelength = $timeend - $timestart; 

echo $timelength; 
flush(); 
echo "<br><br><br>";
And the result...

Code: Select all

Array is 50000 elements in length.

COUNT() in loop
0.0366899967194


COUNT() outside loop
0.0190300941467
Count has to calculate the array size each time it is called.

Posted: Wed Aug 02, 2006 5:43 pm
by Ollie Saunders
Compare

Code: Select all

for ($i=0; $i<count($array); $i++) {
to

Code: Select all

function someInt()
{
    return 20;
}
for ($i=0; $i<someInt(); $i++) {
and I bet count will be faster.
Actually, count isn't returning a precomputed value. :) It has to compute the value every time it is executed.
Your comment reminded me actually. That we are speculating here.

Posted: Wed Aug 02, 2006 5:49 pm
by Ambush Commander
The overhead you're seeing is the cost of actually calling a function. Count is pre-computed: counting a small array is the same (actually, for me, slower) as counting a large array:

Code: Select all

<?php

function microtime_float()
{
    list($usec, $sec) = explode(' ', microtime());
    return (float)($usec + $sec);
}

/**
 * *****************************************
 */
echo "COUNT(\$big_array)<br>";
$big_array = array();
for($i=0; $i<50000; $i++)
{
        $big_array[] = mt_rand(25000,$i+25000);
}
$timestart = microtime_float();
for ($i=0; $i<50000; $i++) {
    count($big_array);
}
$timeend = microtime_float();
$timelength = $timeend - $timestart;

echo $timelength;
flush();
echo "<br><br><br>";

/**
 * *****************************************
 */
echo "COUNT(\$small_array)<br>";
$small_array = array('asd', 'f');
$timestart = microtime_float();
for ($i=0; $i<50000; $i++) {
    count($small_array);
}
$timeend = microtime_float();
$timelength = $timeend - $timestart;

echo $timelength;
flush();
echo "<br><br><br>"; 

?>

Code: Select all

COUNT($big_array)
0.37805414199829


COUNT($small_array)
0.56720304489136

Posted: Wed Aug 02, 2006 11:07 pm
by alex.barylski
I'm on AKAPJ's side here, count() is *not* pre-computed or cached, etc...

It's a well known performance boost to *not* store count() inside a "for" loop, etc...

This is obvious when one caches count() manually before a loop and then calls it directly inside the "for" expression...

The difference is noticable...

Posted: Thu Aug 03, 2006 2:59 am
by AKA Panama Jack
Ambush Commander wrote:

Code: Select all

COUNT($big_array)
0.37805414199829


COUNT($small_array)
0.56720304489136
You just proved that it isn't precalculated. :D If it was then there shouldn't be any difference between the time it takes with your example no matter what the size of the array happens to be. They should be virtually the same.

Posted: Thu Aug 03, 2006 3:14 am
by Jenk
Guys, calling a function, even if it returns a pre-computed value, is going to be slower than using a cached value in a variable, whatever the scenario.

That test does not prove it doesn't use a precomputed value at all.

Read the PHP source code for count() if you want to prove/disprove.

Think about it..

Posted: Thu Aug 03, 2006 3:28 am
by AKA Panama Jack
Jenk wrote:Guys, calling a function, even if it returns a pre-computed value, is going to be slower than using a cached value in a variable, whatever the scenario.

That test does not prove it doesn't use a precomputed value at all.

Read the PHP source code for count() if you want to prove/disprove.

Think about it..
But his example for the value being precomputed used the COUNT() function on BOTH a small and large array. Oddly the small array gave a larger time but it was the same loop calling the same count() function. If the value was precalculated the time exection for both loops should have been almost the same.

Posted: Thu Aug 03, 2006 5:05 am
by volka
Jenk wrote:Read the PHP source code for count() if you want to prove/disprove.
good idea.
Let's take a closer look at the php code behind count($anArray) - source is php 5.1.4.

(ext/standard/array.c)

Code: Select all

/* {{{ proto int count(mixed var [, int mode])
   Count the number of elements in a variable (usually an array) */
PHP_FUNCTION(count)
{
	zval *array;
	long mode = COUNT_NORMAL;
	
	if (zend_parse_parameters (ZEND_NUM_ARGS() TSRMLS_CC, "z|l", &array, &mode) == FAILURE)
		return;
	
	switch (Z_TYPE_P(array)) {
		case IS_NULL:
			[...]
		case IS_ARRAY:
			RETURN_LONG (php_count_recursive (array, mode TSRMLS_CC));
			break;
		case IS_OBJECT: {
			[...]
		}
		default:
			RETURN_LONG(1);
			break;
	}
}
/* }}} */
php_count_recursive is called and it looks like
(ext/standard/array.c)

Code: Select all

static int php_count_recursive(zval *array, long mode TSRMLS_DC)
{
	long cnt = 0;
	zval **element;

	if (Z_TYPE_P(array) == IS_ARRAY) {
		cnt = zend_hash_num_elements(Z_ARRVAL_P(array));
		if (mode == COUNT_RECURSIVE) {
			[...]
	}

	return cnt;
}
We called count without parameter, second defaults to COUNT_NORMAL, so only zend_hash_num_elements is invoked.
And here it is
(Zend/zend_hash.c)

Code: Select all

ZEND_API int zend_hash_num_elements(HashTable *ht)
{
	IS_CONSISTENT(ht);

	return ht->nNumOfElements;
}
After some function calls and conditional branches a pre-computed (each time the array is altered) value is returned.

Posted: Thu Aug 03, 2006 5:14 am
by Benjamin
Very impressive volka

Posted: Thu Aug 03, 2006 6:51 am
by Ollie Saunders
Well done volka!
Thanks for settling that.

Posted: Thu Aug 03, 2006 6:34 pm
by alex.barylski
Thats what I should have done :oops:

Although I did remember reading about count() in a blog a while back and how it doesn't return a cached value...

This is something like what i remember: http://ilia.ws/archives/12-PHP-Optimiza ... html#c7488

What I meant was the above, in words like: COUNT() is executed every iteration the value is not cached by the ZE. Yes arrays keep a counter & current element pointer, but the Zend engine doesn't cache the COUNT() return value.

So it is more efficient to store a count() return in a variable outside the for loop and refer to that variable instead of calling COUNT() inside the for function, despite returning a cache of the element count.

In anycase, you may want to read that blog anyways as it has some helpful hints