surprising benchmark result for arrays

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
mindplay
Forum Commoner
Posts: 25
Joined: Sat Sep 16, 2006 1:32 pm

surprising benchmark result for arrays

Post by mindplay »

I'm an optimization freak.

Today I decided to find out which is faster, numerically indexed arrays, or name/value (hash) type arrays.

The result is shocking.

Code: Select all

Average for numeric lookup: 0.00121617 msec
Average for hash-based lookup: 0.00000271 msec
Hash-based vs numeric ratio = 1 : 449.59
 
Average for numeric lookup: 0.00126696 msec
Average for hash-based lookup: 0.00000232 msec
Hash-based vs numeric ratio = 1 : 545.14
 
Average for numeric lookup: 0.00126004 msec
Average for hash-based lookup: 0.00000213 msec
Hash-based vs numeric ratio = 1 : 591.56
 
Average for numeric lookup: 0.00153804 msec
Average for hash-based lookup: 0.00000202 msec
Hash-based vs numeric ratio = 1 : 761.45
 
Average for numeric lookup: 0.00145197 msec
Average for hash-based lookup: 0.00000234 msec
Hash-based vs numeric ratio = 1 : 620.23
 
Average for numeric lookup: 0.00121593 msec
Average for hash-based lookup: 0.00000214 msec
Hash-based vs numeric ratio = 1 : 567.11
 
Average for numeric lookup: 0.00118804 msec
Average for hash-based lookup: 0.00000201 msec
Hash-based vs numeric ratio = 1 : 590.47
 
Average for numeric lookup: 0.00122190 msec
Average for hash-based lookup: 0.00000198 msec
Hash-based vs numeric ratio = 1 : 616.43
 
Average for numeric lookup: 0.00119185 msec
Average for hash-based lookup: 0.00000214 msec
Hash-based vs numeric ratio = 1 : 558.24
 
Average for numeric lookup: 0.00118995 msec
Average for hash-based lookup: 0.00000198 msec
Hash-based vs numeric ratio = 1 : 600.10
It appears that using hashes is about 600 times faster than using numeric indexes. Needless to say, I was expecting to see the exact opposite - a hash-based lookup is far more complicated by nature than a simple numeric index. Not so in PHP, apparently.

Run the benchmark and see for yourself:

Code: Select all

<?php
 
header('Content-Type: text/plain');
 
$n = array();
$n[] = 'one';
$n[] = 'two';
$n[] = 'three';
$n[] = 'four';
$n[] = 'five';
$n[] = 'six';
$n[] = 'seven';
$n[] = 'eight';
$n[] = 'nine';
$n[] = 'ten';
 
$h = array(
  "1" => "one",
  "2" => "two",
  "3" => "three",
  "4" => "four",
  "5" => "five",
  "6" => "six",
  "7" => "seven",
  "8" => "eight",
  "9" => "nine",
  "10" => "ten"
);
 
define('REPEAT', 1000);
 
for ($tests=0; $tests<10; $tests++) {
  
  // --- Benchmark numeric lookup:
  
  $s = microtime(true);
  
  for ($i=0; $i<REPEAT; $i++) {
    $test = $n[0];
    $test = $n[1];
    $test = $n[2];
    $test = $n[3];
    $test = $n[4];
    $test = $n[5];
    $test = $n[6];
    $test = $n[7];
    $test = $n[8];
    $test = $n[9];
  }
  
  $e = microtime(true);
  
  $num_lag = 1000*(($e-$s)/REPEAT);
  
  echo "Average for numeric lookup: " . number_format($num_lag,8) . " msec\n";
  
  // --- Benchmark hash-based lookup:
  
  $s = microtime(true);
  
  for ($i=0; $i<REPEAT; $i++) {
    $test = $h['1'];
    $test = $h['2'];
    $test = $h['3'];
    $test = $h['4'];
    $test = $h['5'];
    $test = $h['6'];
    $test = $h['7'];
    $test = $h['8'];
    $test = $h['9'];
    $test = $h['10'];
  }
  
  $e = microtime(true);
  
  $hash_lag = ($e-$s)/REPEAT;
  
  echo "Average for hash-based lookup: " . number_format($hash_lag,8) . " msec\n";
  
  echo "Hash-based vs numeric ratio = 1 : " . number_format($num_lag/$hash_lag,2) . "\n\n";
  
}
 
?>
I find this rather puzzling. Nothing appears to be wrong with the benchmark though? Does anyone know why numeric indexes would be so much slower in PHP than hashes?
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: surprising benchmark result for arrays

Post by requinix »

This was a fun distraction.

Code: Select all

<?php
 
header("Content-Type: text/plain");
date_default_timezone_set("XXX");
 
// number of individual loop iterations
define("TOP", 5000);
// number of passes to make
define("PASSES", 200);
 
// create arrays
$m = $n = $o = array();
for ($i = 0; $i < TOP; $i++) $m[] = $n[$i] = $o["i=$i"] = "i=$i";
 
// test types and variables used
$tests = array("linear", "random");
$vars = array("m", "n", "o");
 
// create results and statistics arrays
$results = $stats = array();
foreach ($tests as $test) {
    $results[$test] = $stats[$test] = array();
    foreach ($vars as $var) {
        $results[$test][$var] = array();
        $stats[$test][$var] = array("sum" => 0, "avg" => 0, "stdev" => 0, "median" => 0);
    }
}
 
/*
For all I know, PHP may optimize code enough to recognize when a variable isn't used.
It wouldn't be fair to assign (eg) $k="i=$j" for each array type when only one or two actually use it.
 
Thus the echo statements to force PHP to definitely create the variable.
*/
 
echo "Start: ", date('r'), "\n";
 
ob_start(); // discard echoes
for ($i = 0; $i <= PASSES; $i++) {
 
    /* linear */
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = "i=$j"; $variable = $m[$j]; } $b = microtime(true);
    $results["linear"]["m"][] = 1000 * ($b - $a);
 
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = "i=$j"; $variable = $n[$j]; } $b = microtime(true);
    $results["linear"]["n"][] = 1000 * ($b - $a);
 
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = "i=$j"; $variable = $o[$k]; } $b = microtime(true);
    $results["linear"]["o"][] = 1000 * ($b - $a);
 
    /* random */
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = mt_rand(1, TOP) - 1, $l = "i=$k"; $variable = $m[$k]; } $b = microtime(true);
    $results["random"]["m"][] = 1000 * ($b - $a);
 
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = mt_rand(1, TOP) - 1, $l = "i=$k"; $variable = $n[$k]; } $b = microtime(true);
    $results["random"]["n"][] = 1000 * ($b - $a);
 
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = mt_rand(1, TOP) - 1, $l = "i=$k"; $variable = $m[$l]; } $b = microtime(true);
    $results["random"]["o"][] = 1000 * ($b - $a);
 
}
ob_end_clean();
 
// remove the first test pass, sort the rest, and calculate statistics
foreach ($tests as $test) foreach ($vars as $var) {
    unset($results[$test][$var][0]);
    sort($results[$test][$var], SORT_NUMERIC);
 
    $stats[$test][$var]["sum"] = array_sum($results[$test][$var]);
    $stats[$test][$var]["avg"] = $stats[$test][$var]["sum"] / PASSES;
    foreach ($results[$test][$var] as $value) $stats[$test][$var]["stdev"] += pow($value - $stats[$test][$var]["avg"], 2);
    $stats[$test][$var]["stdev"] = sqrt($stats[$test][$var]["stdev"] / PASSES);
    $stats[$test][$var]["median"] = ($results[$test][$var][floor(PASSES / 2 + .5)] + $results[$test][$var][ceil(PASSES / 2 + .1)]) / 2;
}
 
echo "End:   ", date('r'), "\n";
 
print_r($stats);

Code: Select all

Start: Fri, 05 Dec 2008 XX:15:12 -XXXX
End:   Fri, 05 Dec 2008 XX:15:56 -XXXX
Array
(
    [linear] => Array
        (
            [m] => Array
                (
                    [sum] => 2936.50579453
                    [avg] => 14.6825289726
                    [stdev] => 30.6740097145
                    [median] => 5.37157058716
                )
 
            [n] => Array
                (
                    [sum] => 2796.56147957
                    [avg] => 13.9828073978
                    [stdev] => 28.1920883239
                    [median] => 4.99498844147
                )
 
            [o] => Array
                (
                    [sum] => 2543.72763634
                    [avg] => 12.7186381817
                    [stdev] => 25.7298647139
                    [median] => 5.5159330368
                )
 
        )
 
    [random] => Array
        (
            [m] => Array
                (
                    [sum] => 4815.42158127
                    [avg] => 24.0771079063
                    [stdev] => 36.1579655166
                    [median] => 10.7699632645
                )
 
            [n] => Array
                (
                    [sum] => 5013.40985298
                    [avg] => 25.0670492649
                    [stdev] => 35.537875557
                    [median] => 10.6528997421
                )
 
            [o] => Array
                (
                    [sum] => 25817.7006245
                    [avg] => 129.088503122
                    [stdev] => 78.9223973214
                    [median] => 116.837978363
                )
 
        )
 
)
Milliseconds.

Remember, you can't compare the times across test types.
Post Reply