Performance Implications of count()

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Performance Implications of count()

Post by Ollie Saunders »

So continuing my series of "Performance Implications of" this time we look to count().

The psychology of programming comes into play here. When I first started programming in PHP I used to use sizeof() instead of count(). Partly this was because I was programming in C++ mainly and the other reason was because count() sounds like an expensive operation where sizeof() less so. Of course any PHP dev worth their salt will know that sizeof() is just an alias of count() and are besides the name exactly the same. But one question always lumbered in my mind does count() count? Or does PHP monitor the size of the array as it is operated upon and count() is simply used to return that internal data? I was pretty sure that the latter was the case, lets see if I am right.

Here's the code:

Code: Select all

function microtime_float()
{
    list($usec, $sec) = explode(' ', microtime());
    return (float)($usec + $sec);
}

$array = array(1,5,6,3,7,8,3,6,7,3,1,5,7,8,4,2,5,1............ // it continues for a while
echo 'Array is ' . count($array) . " elements in length.\n\n\n";

/**
 * *****************************************
 */
echo "COUNT()\n";
$time = array();

for ($i=0, $j=10; $i<$j; $i++) {
    $k = 0;
    $time['start'] = microtime_float();
    for (;$k<50000; $k++) {
        count($array);
    }
    $time['end'] = microtime_float();
    $time['length'][] = $time['end'] - $time['start'];
}

print_r($time['length']);
flush();
echo "\n\n\n";

/**
 * *****************************************
 */
echo "SIZEOF()\n";
$time = array();

for ($i=0; $i<$j; $i++) {
    $k = 0;
    $time['start'] = microtime_float();
    for (;$k<50000; $k++) {
        sizeof($array);
    }
    $time['end'] = microtime_float();
    $time['length'][] = $time['end'] - $time['start'];
}
print_r($time['length']);
flush();
echo "\n\n\n";

/**
 * *****************************************
 */
echo "GETINTWITHARRAYPARAM()\n";
$time = array();
function getIntWithArrayParam($a)
{
    return 16;
}

for ($i=0; $i<$j; $i++) {
    $k = 0;
    $time['start'] = microtime_float();
    for (;$k<50000; $k++) {
        getIntWithArrayParam($array);
    }
    $time['end'] = microtime_float();
    $time['length'][] = $time['end'] - $time['start'];
}
print_r($time['length']);
flush();
echo "\n\n\n";

/**
 * *****************************************
 */
echo "GETINT()\n";
$time = array();
function getInt()
{
    return 16;
}

for ($i=0; $i<$j; $i++) {
    $k = 0;
    $time['start'] = microtime_float();
    for (;$k<50000; $k++) {
        getInt();
    }
    $time['end'] = microtime_float();
    $time['length'][] = $time['end'] - $time['start'];
}
print_r($time['length']);
flush();
This performs 4 different tests:
  1. Calling count() on a large array
  2. Calling sizeof() on a large array
  3. Calling of a user function that returns 16 with $array as the argument.
  4. Calling of a user function that returns 16 with no arguments
I predicted that 4 would be fastest followed by 1 then 2 and finally 3.

Here is the output:

Code: Select all

Array is 2700 elements in length.


COUNT()
Array
(
    [0] => 0.0332798957825
    [1] => 0.0332040786743
    [2] => 0.0331971645355
    [3] => 0.0338530540466
    [4] => 0.0339250564575
    [5] => 0.0339081287384
    [6] => 0.0338890552521
    [7] => 0.0338859558105
    [8] => 0.0338299274445
    [9] => 0.03391289711
)



SIZEOF()
Array
(
    [0] => 0.0334188938141
    [1] => 0.0333170890808
    [2] => 0.0333638191223
    [3] => 0.0333349704742
    [4] => 0.0333120822906
    [5] => 0.0340440273285
    [6] => 0.0340280532837
    [7] => 0.0342969894409
    [8] => 0.0340049266815
    [9] => 0.0340030193329
)



GETINTWITHARRAYPARAM()
Array
(
    [0] => 0.0572888851166
    [1] => 0.0563662052155
    [2] => 0.0563371181488
    [3] => 0.0576510429382
    [4] => 0.0576648712158
    [5] => 0.0571188926697
    [6] => 0.0576598644257
    [7] => 0.0571818351746
    [8] => 0.057657957077
    [9] => 0.0562329292297
)



GETINT()
Array
(
    [0] => 0.0364010334015
    [1] => 0.0363349914551
    [2] => 0.0363218784332
    [3] => 0.0364139080048
    [4] => 0.0369350910187
    [5] => 0.0371720790863
    [6] => 0.0372090339661
    [7] => 0.0372529029846
    [8] => 0.0372650623322
    [9] => 0.0362319946289
)
The results show that 50 000 calls to count() an array 2700 elements in length is faster than a call to a function that returns 16!
There is no significant difference between count() and sizeof().
Using an array as an argument is consideribly more expensive.

Hope you learnt something folks.

The reason I did this was because I am creating a class that implements ArrayAccess, Iterator and Countable interfaces and I wondered if I could improve performance of calls to count by caching the length of the array private to my class but the reality is had I done that it would have likely slowed the performance of my class overall
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

While I'm against this sort of benchmarking because it can vary wildly depending on what system you use it on, there's a good reason why count() is so fast: PHP internally stores the number of elements of an array in the datastructure for the array, so count is just returning a pre-computed value.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Ambush Commander wrote:While I'm against this sort of benchmarking because it can vary wildly depending on what system you use it on, there's a good reason why count() is so fast: PHP internally stores the number of elements of an array in the datastructure for the array, so count is just returning a pre-computed value.
You should've read the post, mate xD
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

That's two mistakes today... hmm... maybe I should take a break.
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

Actually, count isn't returning a precomputed value. :) It has to compute the value every time it is executed.

Here is an example...

Code: Select all

function microtime_float() 
{ 
    list($usec, $sec) = explode(' ', microtime()); 
    return (float)($usec + $sec); 
} 

$array = array();
for($i=0; $i<50000; $i++)
{
	$array[] = mt_rand(1,$i);
}
echo 'Array is ' . count($array) . " elements in length.<br><br>"; 

/** 
 * ***************************************** 
 */ 
echo "COUNT() in loop<br>"; 

$timestart = microtime_float(); 
for ($i=0; $i<count($array); $i++) { 
    $k = 0; 
} 
$timeend = microtime_float(); 
$timelength = $timeend - $timestart; 

echo $timelength; 
flush(); 
echo "<br><br><br>";

/** 
 * ***************************************** 
 */ 
echo "COUNT() outside loop<br>"; 

$timestart = microtime_float(); 
$count = count($array);
for ($i=0; $i<$count; $i++) { 
    $k = 0; 
} 
$timeend = microtime_float(); 
$timelength = $timeend - $timestart; 

echo $timelength; 
flush(); 
echo "<br><br><br>";
And the result...

Code: Select all

Array is 50000 elements in length.

COUNT() in loop
0.0366899967194


COUNT() outside loop
0.0190300941467
Count has to calculate the array size each time it is called.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Compare

Code: Select all

for ($i=0; $i<count($array); $i++) {
to

Code: Select all

function someInt()
{
    return 20;
}
for ($i=0; $i<someInt(); $i++) {
and I bet count will be faster.
Actually, count isn't returning a precomputed value. :) It has to compute the value every time it is executed.
Your comment reminded me actually. That we are speculating here.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

The overhead you're seeing is the cost of actually calling a function. Count is pre-computed: counting a small array is the same (actually, for me, slower) as counting a large array:

Code: Select all

<?php

function microtime_float()
{
    list($usec, $sec) = explode(' ', microtime());
    return (float)($usec + $sec);
}

/**
 * *****************************************
 */
echo "COUNT(\$big_array)<br>";
$big_array = array();
for($i=0; $i<50000; $i++)
{
        $big_array[] = mt_rand(25000,$i+25000);
}
$timestart = microtime_float();
for ($i=0; $i<50000; $i++) {
    count($big_array);
}
$timeend = microtime_float();
$timelength = $timeend - $timestart;

echo $timelength;
flush();
echo "<br><br><br>";

/**
 * *****************************************
 */
echo "COUNT(\$small_array)<br>";
$small_array = array('asd', 'f');
$timestart = microtime_float();
for ($i=0; $i<50000; $i++) {
    count($small_array);
}
$timeend = microtime_float();
$timelength = $timeend - $timestart;

echo $timelength;
flush();
echo "<br><br><br>"; 

?>

Code: Select all

COUNT($big_array)
0.37805414199829


COUNT($small_array)
0.56720304489136
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

I'm on AKAPJ's side here, count() is *not* pre-computed or cached, etc...

It's a well known performance boost to *not* store count() inside a "for" loop, etc...

This is obvious when one caches count() manually before a loop and then calls it directly inside the "for" expression...

The difference is noticable...
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

Ambush Commander wrote:

Code: Select all

COUNT($big_array)
0.37805414199829


COUNT($small_array)
0.56720304489136
You just proved that it isn't precalculated. :D If it was then there shouldn't be any difference between the time it takes with your example no matter what the size of the array happens to be. They should be virtually the same.
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Post by Jenk »

Guys, calling a function, even if it returns a pre-computed value, is going to be slower than using a cached value in a variable, whatever the scenario.

That test does not prove it doesn't use a precomputed value at all.

Read the PHP source code for count() if you want to prove/disprove.

Think about it..
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

Jenk wrote:Guys, calling a function, even if it returns a pre-computed value, is going to be slower than using a cached value in a variable, whatever the scenario.

That test does not prove it doesn't use a precomputed value at all.

Read the PHP source code for count() if you want to prove/disprove.

Think about it..
But his example for the value being precomputed used the COUNT() function on BOTH a small and large array. Oddly the small array gave a larger time but it was the same loop calling the same count() function. If the value was precalculated the time exection for both loops should have been almost the same.
User avatar
volka
DevNet Evangelist
Posts: 8391
Joined: Tue May 07, 2002 9:48 am
Location: Berlin, ger

Post by volka »

Jenk wrote:Read the PHP source code for count() if you want to prove/disprove.
good idea.
Let's take a closer look at the php code behind count($anArray) - source is php 5.1.4.

(ext/standard/array.c)

Code: Select all

/* {{{ proto int count(mixed var [, int mode])
   Count the number of elements in a variable (usually an array) */
PHP_FUNCTION(count)
{
	zval *array;
	long mode = COUNT_NORMAL;
	
	if (zend_parse_parameters (ZEND_NUM_ARGS() TSRMLS_CC, "z|l", &array, &mode) == FAILURE)
		return;
	
	switch (Z_TYPE_P(array)) {
		case IS_NULL:
			[...]
		case IS_ARRAY:
			RETURN_LONG (php_count_recursive (array, mode TSRMLS_CC));
			break;
		case IS_OBJECT: {
			[...]
		}
		default:
			RETURN_LONG(1);
			break;
	}
}
/* }}} */
php_count_recursive is called and it looks like
(ext/standard/array.c)

Code: Select all

static int php_count_recursive(zval *array, long mode TSRMLS_DC)
{
	long cnt = 0;
	zval **element;

	if (Z_TYPE_P(array) == IS_ARRAY) {
		cnt = zend_hash_num_elements(Z_ARRVAL_P(array));
		if (mode == COUNT_RECURSIVE) {
			[...]
	}

	return cnt;
}
We called count without parameter, second defaults to COUNT_NORMAL, so only zend_hash_num_elements is invoked.
And here it is
(Zend/zend_hash.c)

Code: Select all

ZEND_API int zend_hash_num_elements(HashTable *ht)
{
	IS_CONSISTENT(ht);

	return ht->nNumOfElements;
}
After some function calls and conditional branches a pre-computed (each time the array is altered) value is returned.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

Very impressive volka
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Well done volka!
Thanks for settling that.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Thats what I should have done :oops:

Although I did remember reading about count() in a blog a while back and how it doesn't return a cached value...

This is something like what i remember: http://ilia.ws/archives/12-PHP-Optimiza ... html#c7488

What I meant was the above, in words like: COUNT() is executed every iteration the value is not cached by the ZE. Yes arrays keep a counter & current element pointer, but the Zend engine doesn't cache the COUNT() return value.

So it is more efficient to store a count() return in a variable outside the for loop and refer to that variable instead of calling COUNT() inside the for function, despite returning a cache of the element count.

In anycase, you may want to read that blog anyways as it has some helpful hints
Post Reply