Page 1 of 1

surprising benchmark result for arrays

Posted: Fri Dec 05, 2008 1:10 pm
by mindplay
I'm an optimization freak.

Today I decided to find out which is faster, numerically indexed arrays, or name/value (hash) type arrays.

The result is shocking.

Code: Select all

Average for numeric lookup: 0.00121617 msec
Average for hash-based lookup: 0.00000271 msec
Hash-based vs numeric ratio = 1 : 449.59
 
Average for numeric lookup: 0.00126696 msec
Average for hash-based lookup: 0.00000232 msec
Hash-based vs numeric ratio = 1 : 545.14
 
Average for numeric lookup: 0.00126004 msec
Average for hash-based lookup: 0.00000213 msec
Hash-based vs numeric ratio = 1 : 591.56
 
Average for numeric lookup: 0.00153804 msec
Average for hash-based lookup: 0.00000202 msec
Hash-based vs numeric ratio = 1 : 761.45
 
Average for numeric lookup: 0.00145197 msec
Average for hash-based lookup: 0.00000234 msec
Hash-based vs numeric ratio = 1 : 620.23
 
Average for numeric lookup: 0.00121593 msec
Average for hash-based lookup: 0.00000214 msec
Hash-based vs numeric ratio = 1 : 567.11
 
Average for numeric lookup: 0.00118804 msec
Average for hash-based lookup: 0.00000201 msec
Hash-based vs numeric ratio = 1 : 590.47
 
Average for numeric lookup: 0.00122190 msec
Average for hash-based lookup: 0.00000198 msec
Hash-based vs numeric ratio = 1 : 616.43
 
Average for numeric lookup: 0.00119185 msec
Average for hash-based lookup: 0.00000214 msec
Hash-based vs numeric ratio = 1 : 558.24
 
Average for numeric lookup: 0.00118995 msec
Average for hash-based lookup: 0.00000198 msec
Hash-based vs numeric ratio = 1 : 600.10
It appears that using hashes is about 600 times faster than using numeric indexes. Needless to say, I was expecting to see the exact opposite - a hash-based lookup is far more complicated by nature than a simple numeric index. Not so in PHP, apparently.

Run the benchmark and see for yourself:

Code: Select all

<?php
 
header('Content-Type: text/plain');
 
$n = array();
$n[] = 'one';
$n[] = 'two';
$n[] = 'three';
$n[] = 'four';
$n[] = 'five';
$n[] = 'six';
$n[] = 'seven';
$n[] = 'eight';
$n[] = 'nine';
$n[] = 'ten';
 
$h = array(
  "1" => "one",
  "2" => "two",
  "3" => "three",
  "4" => "four",
  "5" => "five",
  "6" => "six",
  "7" => "seven",
  "8" => "eight",
  "9" => "nine",
  "10" => "ten"
);
 
define('REPEAT', 1000);
 
for ($tests=0; $tests<10; $tests++) {
  
  // --- Benchmark numeric lookup:
  
  $s = microtime(true);
  
  for ($i=0; $i<REPEAT; $i++) {
    $test = $n[0];
    $test = $n[1];
    $test = $n[2];
    $test = $n[3];
    $test = $n[4];
    $test = $n[5];
    $test = $n[6];
    $test = $n[7];
    $test = $n[8];
    $test = $n[9];
  }
  
  $e = microtime(true);
  
  $num_lag = 1000*(($e-$s)/REPEAT);
  
  echo "Average for numeric lookup: " . number_format($num_lag,8) . " msec\n";
  
  // --- Benchmark hash-based lookup:
  
  $s = microtime(true);
  
  for ($i=0; $i<REPEAT; $i++) {
    $test = $h['1'];
    $test = $h['2'];
    $test = $h['3'];
    $test = $h['4'];
    $test = $h['5'];
    $test = $h['6'];
    $test = $h['7'];
    $test = $h['8'];
    $test = $h['9'];
    $test = $h['10'];
  }
  
  $e = microtime(true);
  
  $hash_lag = ($e-$s)/REPEAT;
  
  echo "Average for hash-based lookup: " . number_format($hash_lag,8) . " msec\n";
  
  echo "Hash-based vs numeric ratio = 1 : " . number_format($num_lag/$hash_lag,2) . "\n\n";
  
}
 
?>
I find this rather puzzling. Nothing appears to be wrong with the benchmark though? Does anyone know why numeric indexes would be so much slower in PHP than hashes?

Re: surprising benchmark result for arrays

Posted: Fri Dec 05, 2008 4:18 pm
by requinix
This was a fun distraction.

Code: Select all

<?php
 
header("Content-Type: text/plain");
date_default_timezone_set("XXX");
 
// number of individual loop iterations
define("TOP", 5000);
// number of passes to make
define("PASSES", 200);
 
// create arrays
$m = $n = $o = array();
for ($i = 0; $i < TOP; $i++) $m[] = $n[$i] = $o["i=$i"] = "i=$i";
 
// test types and variables used
$tests = array("linear", "random");
$vars = array("m", "n", "o");
 
// create results and statistics arrays
$results = $stats = array();
foreach ($tests as $test) {
    $results[$test] = $stats[$test] = array();
    foreach ($vars as $var) {
        $results[$test][$var] = array();
        $stats[$test][$var] = array("sum" => 0, "avg" => 0, "stdev" => 0, "median" => 0);
    }
}
 
/*
For all I know, PHP may optimize code enough to recognize when a variable isn't used.
It wouldn't be fair to assign (eg) $k="i=$j" for each array type when only one or two actually use it.
 
Thus the echo statements to force PHP to definitely create the variable.
*/
 
echo "Start: ", date('r'), "\n";
 
ob_start(); // discard echoes
for ($i = 0; $i <= PASSES; $i++) {
 
    /* linear */
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = "i=$j"; $variable = $m[$j]; } $b = microtime(true);
    $results["linear"]["m"][] = 1000 * ($b - $a);
 
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = "i=$j"; $variable = $n[$j]; } $b = microtime(true);
    $results["linear"]["n"][] = 1000 * ($b - $a);
 
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = "i=$j"; $variable = $o[$k]; } $b = microtime(true);
    $results["linear"]["o"][] = 1000 * ($b - $a);
 
    /* random */
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = mt_rand(1, TOP) - 1, $l = "i=$k"; $variable = $m[$k]; } $b = microtime(true);
    $results["random"]["m"][] = 1000 * ($b - $a);
 
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = mt_rand(1, TOP) - 1, $l = "i=$k"; $variable = $n[$k]; } $b = microtime(true);
    $results["random"]["n"][] = 1000 * ($b - $a);
 
    $a = microtime(true); for ($j = 0; $j < TOP; $j++) { echo $k = mt_rand(1, TOP) - 1, $l = "i=$k"; $variable = $m[$l]; } $b = microtime(true);
    $results["random"]["o"][] = 1000 * ($b - $a);
 
}
ob_end_clean();
 
// remove the first test pass, sort the rest, and calculate statistics
foreach ($tests as $test) foreach ($vars as $var) {
    unset($results[$test][$var][0]);
    sort($results[$test][$var], SORT_NUMERIC);
 
    $stats[$test][$var]["sum"] = array_sum($results[$test][$var]);
    $stats[$test][$var]["avg"] = $stats[$test][$var]["sum"] / PASSES;
    foreach ($results[$test][$var] as $value) $stats[$test][$var]["stdev"] += pow($value - $stats[$test][$var]["avg"], 2);
    $stats[$test][$var]["stdev"] = sqrt($stats[$test][$var]["stdev"] / PASSES);
    $stats[$test][$var]["median"] = ($results[$test][$var][floor(PASSES / 2 + .5)] + $results[$test][$var][ceil(PASSES / 2 + .1)]) / 2;
}
 
echo "End:   ", date('r'), "\n";
 
print_r($stats);

Code: Select all

Start: Fri, 05 Dec 2008 XX:15:12 -XXXX
End:   Fri, 05 Dec 2008 XX:15:56 -XXXX
Array
(
    [linear] => Array
        (
            [m] => Array
                (
                    [sum] => 2936.50579453
                    [avg] => 14.6825289726
                    [stdev] => 30.6740097145
                    [median] => 5.37157058716
                )
 
            [n] => Array
                (
                    [sum] => 2796.56147957
                    [avg] => 13.9828073978
                    [stdev] => 28.1920883239
                    [median] => 4.99498844147
                )
 
            [o] => Array
                (
                    [sum] => 2543.72763634
                    [avg] => 12.7186381817
                    [stdev] => 25.7298647139
                    [median] => 5.5159330368
                )
 
        )
 
    [random] => Array
        (
            [m] => Array
                (
                    [sum] => 4815.42158127
                    [avg] => 24.0771079063
                    [stdev] => 36.1579655166
                    [median] => 10.7699632645
                )
 
            [n] => Array
                (
                    [sum] => 5013.40985298
                    [avg] => 25.0670492649
                    [stdev] => 35.537875557
                    [median] => 10.6528997421
                )
 
            [o] => Array
                (
                    [sum] => 25817.7006245
                    [avg] => 129.088503122
                    [stdev] => 78.9223973214
                    [median] => 116.837978363
                )
 
        )
 
)
Milliseconds.

Remember, you can't compare the times across test types.