Page 1 of 1

PHP5 foreach behaviour and some questions about array_keys

Posted: Mon Nov 13, 2006 11:44 am
by BDKR
I had an interview last week that ultimately sucked when you get right down to it.

One of the things I was asked about during what I think was the 3rd wave of attackers was how I deal with iterating or taversing multi-dimensional arrays. Part of my answer was, of course, foreach.

From there he started talking about how array keys was faster and more efficient than foreach. This is the second time I've heared this. I did'nt believe it the first time but after hearing this squawk, I came home and wrote some code to see if I could get down to the bottom of what it.

First things first: why are people saying this? If it's faster, faster in comparison to what? Does it depend on the nature of the data you are iterating over. I hate being questioned on my ability to express logic so some FRACKIN' (Sorry. :oops: Too much Battlestar Galactica) context would go a long ways.

Anyway, I did some benchmarking and found that using foreach over an array of objects is far faster then array_keys. So which is it? Is their something I'm missing.

And did the behaviour of PHP5 and foreach change? It seems to be passing elements by ref by default now. Before, the default behaviour was to do it by copy.

Cheers

Posted: Mon Nov 13, 2006 1:05 pm
by RobertGonzalez
PHP5 does everything by reference natively, from what I understand. And I think that the folks that were interviewing you may have been looking for your response countering their claims. Maybe that wanted to know for sure that you knew your stuff.

Posted: Mon Nov 13, 2006 1:15 pm
by BDKR
Everah wrote:PHP5 does everything by reference natively, from what I understand. And I think that the folks that were interviewing you may have been looking for your response countering their claims. Maybe that wanted to know for sure that you knew your stuff.
My earliest recollection of PHP5 and iterating over an array of objects using foreach was a disaster. That's why the manual says...
Note: Unless the array is referenced, foreach operates on a copy of the specified array and not the array itself. Therefore, the array pointer is not modified as with the each() construct, and changes to the array element returned are not reflected in the original array. However, the internal pointer of the original array is advanced with the processing of the array. Assuming the foreach loop runs to completion, the array's internal pointer will be at the end of the array.

As of PHP 5, you can easily modify array's elements by preceding $value with &. This will assign reference instead of copying the value.
Now you are right that they may have been trying to trick me. Some of the stuff seemed rather strange.

Oh well...

Posted: Mon Nov 13, 2006 1:35 pm
by Christopher
Everah wrote:PHP5 does everything by reference natively, from what I understand. And I think that the folks that were interviewing you may have been looking for your response countering their claims. Maybe that wanted to know for sure that you knew your stuff.
I am not sure this is correct. It is my understanding that PHP5 deals with objects using handles -- which are different than references (i.e. no count). Otherwise other variables are handled generally the same in PHP4 and PHP5. Perhaps someone else can shed more light on this.

Posted: Mon Nov 13, 2006 1:47 pm
by RobertGonzalez
arborint wrote:Perhaps someone else can shed more light on this.
I would certainly appreciate it. I can't effectively answer people's questions if I don't understand that answer myself :oops:.

Posted: Mon Nov 13, 2006 2:35 pm
by Maugrim_The_Reaper
My wouldbe answer:

It depends on whether the performance gain amounts to a significant benefit in the target application. If it doesn't, then it's not all that important (premature optimisation), if it is then I don't know but I could easily setup a quick benchmark to check.

Silly questions invite curt responses... If that wasn't good enough, I'd start wondering if the interviewer had a clue. Would the next question ponder the performance benefits of using require() over require_once()? ;).

Posted: Mon Nov 13, 2006 2:35 pm
by feyd
All variables are reference counted in PHP, no exceptions. The only real difference in variables between 4 and 5 is how assignment and passing (of objects) works. Instead of copying, it now passes a reference. If you want a copy of an object, you have to clone it.

All other variables are copied.

Posted: Mon Nov 13, 2006 3:19 pm
by BDKR
Maugrim_The_Reaper wrote: It depends on whether the performance gain amounts to a significant benefit in the target application. If it doesn't, then it's not all that important (premature optimisation), if it is then I don't know but I could easily setup a quick benchmark to check.
This is the interesting thing. They are a dead serious OO shop using PHP5, but they are concerned about performance and efficiency to a fault. I've was a little suprised to be honest with you, but oh well.

As for the benchmark, I've allready proven myself via some code that using array_keys over an associative array of objects is slower. The slowest being the built in spl array iterator object and the fastest just to use foreach($array as &$val).

South Florida must be in some different dimension. :roll:

Posted: Mon Nov 13, 2006 3:45 pm
by Christopher
feyd wrote:All variables are reference counted in PHP, no exceptions. The only real difference in variables between 4 and 5 is how assignment and passing (of objects) works. Instead of copying, it now passes a reference. If you want a copy of an object, you have to clone it.
I have read a number of places that objects use a different internal data structure called a handle that is different from a reference, though they seem to have similarities. Does anyone know what the specific differences are?

Posted: Mon Nov 13, 2006 5:04 pm
by dbevfat
From php|architect (September 2006), article Is PHP 4 Really Faster Than PHP 5? by Andi Gutmans and Dmitry Stogov:
In PHP 4, objects were treated as primitive data types. On assignment, parameter passing and function returns, the default behaviour was to copy the entire object. In order to avoid this automatic cloning of objects, programmers were required to master by-reference assignment, parameter passing and function returns.
...
In PHP 5, objects are no longer native types, but are represented instead by a handle that refers to the object. The operations mentioned previously [assignments, function returns] no longer auto-clone the object, but its handle, i.e. the value that tells us where the object is located in memory - a simple integer value.
I hope I'm not in some serious violation of some copyright rules for quoting the article. As the authors tell us; the handle gets cloned. I believe this means that handle alone is not enough for reference counting, so I'm guessing that there must be additional logic somewhere, which should work just like good old references.

As for the speed test, see http://www.blueshoes.org/en/developer/php_bench/ and http://www.php.lt/benchmark/phpbench.php for a reference comparison with your benchmarks, although I think the tests are run in PHP 4.

Regards

Posted: Tue Nov 14, 2006 12:23 am
by jmut
hm
could someone give me code example on array_keys vs foreach usage.
I have some clue about it but sounds sooo weird and pointless

Posted: Tue Nov 14, 2006 10:40 am
by BDKR
dbevfat wrote: As for the speed test, see http://www.blueshoes.org/en/developer/php_bench/ and http://www.php.lt/benchmark/phpbench.php for a reference comparison with your benchmarks, although I think the tests are run in PHP 4.
A lot of those results were to be expected I suppose. However, my tests are a bit different. I'm not using the while(list()=each()) mechanism and within each iteration there is a sub-loop that's happening where the true differences between various constructs are being tested.

There was one big suprise, but perfectly understandable when you pull over and think about it for a sec. The length of the variable name has a huge effect on performance. I have two seperate tests using the array_keys()/for() loop construct, but one was made faster simply by reducing the lenghts of the names to something ridiculous and unmaintainable. But even doing that, foreach() with the '&' operator was still faster.

So yes, array_keys() does seem to be pretty fast, but not faster so using it doesn't seem to make sense to me as there are just more hoops to jump through to set it up.

Code: Select all

<?php

class testObject
	{	
	protected $y=0;
	protected $z=0;
	var $name='';
	
	public function meth1($val)
		{ $this->y=($val+1); }
		
	public function meth2()
		{ ++$this->z;	}
		
	public function meth3($val)
		{ $this->name=$val; }		
		
	function reset()
		{ 
		$this->y=0;
		$this->z=0;
		}
	}

# For some reason, calling this before getting started has a good effect on the performance 
# of each subsequent call
getmicrotime(); 			
$iters=1000000;
$num_arrays=4;
$list='jean_grey|rouge|storm|sprite|psylocke';
$girls=explode('|', $list);
$girls_size=count($girls);
$girl_objs=array();
for($x=0; $x<$girls_size; ++$x)
	{ 
	for($xx=0; $xx<$num_arrays; ++$xx)
		{
		$girl_objs1[$girls[$x]]=new testObject; 
		$girl_objs1[$girls[$x]]->name=$girls[$x];
	
		$girl_objs2[$girls[$x]]=new testObject; 	
		$girl_objs2[$girls[$x]]->name=$girls[$x];

		$girl_objs3[$girls[$x]]=new testObject; 	
		$girl_objs3[$girls[$x]]->name=$girls[$x];
	
		$girl_objs4[$girls[$x]]=new testObject; 	
		$girl_objs4[$girls[$x]]->name=$girls[$x];	
		}
	}
reset($girls);

############################################################################
# Using the spl arrayIterator
############################################################################
$time_start=0;
$arrayObj = new arrayObject($girl_objs3);
$iterator=$arrayObj->getIterator();
$time_start=getmicrotime();												// Start the timer here
while($x<$iters)
	{ 
	$iterator->rewind();
	while($iterator->valid())
		{
		$girl=$iterator->current();
		$girl->meth1($x);
		$girl->meth2();
		$iterator->next();
		}
	++$x; 	
	}
echo "Using the spl arrayIterator Ojbect took: " . number_format( ((getmicrotime()) - $time_start),  4) . " seconds.\n\n";
$time_start=0;
############################################################################


############################################################################
# Using the existing key list / array size generated while creating the array of objects
############################################################################
$q=&$girl_objs4;
$z=array_keys($girl_objs4);
$s=sizeof($z);
$time_start=getmicrotime();												// Start the timer here
$x=0;
while($x<$iters)
	{ 
	for($i=0; $i<$s; ++$i)
		{ 
		$q[$z[$i]]->meth1($x);
		$q[$z[$i]]->meth2();		
		}
	++$x; 	
	}
echo "Using array keys and an alias with small var name sizes took: " . number_format( ((getmicrotime()) - $time_start), 4) . " seconds.\n\n";
$time_start=0;
############################################################################


############################################################################
# Using array keys. This is awkward anyways as the object array is essentially an associative array.
############################################################################
$gn=array_keys($girl_objs1);
$j=sizeof($gn);
$time_start=getmicrotime();												// Start the timer here
$x=0;
while($x<$iters)
	{ 
	for($i=0; $i<$j; ++$i)
		{ 
		$girl_objs1[$gn[$i]]->meth1($x);
		$girl_objs1[$gn[$i]]->meth2();
		}
	++$x; 	
	}
echo 'Using array_keys() took: ' . number_format( ((getmicrotime()) - $time_start), 4) . " seconds.\n\n";
$time_start=0;
############################################################################


############################################################################
# Using the plain jane foreach() 
############################################################################
$time_start=getmicrotime();												// Start the timer here
$x=0;
while($x<$iters)
	{ 
	foreach($girl_objs2 as &$girl)
		{ 
		$girl->meth1($x);
		$girl->meth2();
		}
	++$x; 	
	}
echo 'Using foreach() with the \'&\' operator took: ' . number_format( ((getmicrotime()) - $time_start), 4) . " seconds.\n\n";
############################################################################

# Timing courtesy of
function getmicrotime()
  {
  list($usec, $sec) = explode(" ",microtime());
  return ((float)$usec + (float)$sec);
  }

function my_print_r($val)
	{ echo '<pre>'; print_r($val); 'echo </pre>'; }

?>
I ran these tests on the command line using PHP 5.05. I know, it's old as dirt. I've been too busy writing code and working on my car to install 5.2, but I'll
get tht done soon enough.

If there are any problems that anyone can see in the code that has an effect on the performance, please try it or chime in. I've made mistakes in the past with this
stuff so it wouldn't be a first.

Cheers

Posted: Tue Nov 14, 2006 10:44 am
by BDKR
A quick note about the above code. I used a mad number of iterations (1 million I believe) so the differences in performance are more easily grokked.

Cheers