Page 1 of 1

Extension Validation - Loop vs. in_array()

Posted: Sun Aug 12, 2007 9:44 pm
by SidewinderX
I'm storing a set of valid files extensions (which are allowed to be uploaded) an an array. I've looked through the source of about 10 upload functions/classes from various sites [Zend, PHP Classes, other] that have file validation and they all seem to use a loop to check if the $_FILE['name']['extension'] is in the "AllowedExtensions" array.

Is there a reason everyone has opted to use a loop as opposed to the in_array() function?

Posted: Sun Aug 12, 2007 10:10 pm
by iknownothing
case-sensitivity in in_array() maybe?

.JPG is valid, even though the array might only contain .jpg.

Posted: Sun Aug 12, 2007 11:07 pm
by s.dot
Since I check the types of files and manually add on the extension based on the type of file, I use the in_array() function.

Posted: Sun Aug 12, 2007 11:11 pm
by RobertGonzalez
Some people may not know that there is an in_array() function. Or perhaps their benchmarks show that a loop is faster than in_array() (not saying it is, just saying that some benchmarks could show it that way).

Posted: Mon Aug 13, 2007 12:09 am
by John Cartwright
Had a couple minutes to write a quick benchmark. The only issue I have with this benchmark it that is assume you will iterate the entire array, however I can't think of a proper way otherwise to benchmark this..

What my benchmark did was run 2000 iterations of the different kinds of methods on a 200 row array. The benchmark would then reset itself after a shortpause and reset to take the average of 10 test cases. I used both systematically defined array keys and user defined array keys to see what kind of difference that would have as well.
testStrIndex - in_array()
Total: 0.0352233886719 seconds

testStrIndex - for()
Total: 0.170114207268 seconds

testNumIndex - in_array()
Total: 0.0718070030212 seconds

testNumIndex - for()
Total: 0.170265245438 seconds
As you can see, in_array() seems to be more efficient than looping the entire array (assuming you do not break out of it). Although even still, in_array() still has a relatively large deferential. I also changing the array size to a very small array as well as a large large array and the differential ratio was pretty much the same.

And heres the code..

Code: Select all

<?php
set_time_limit(0);
error_reporting(E_ALL);
session_start();

if (empty($_SESSION['testCases'])) {
	$_SESSION['testCases'] = 0;
	$_SESSION['iteration']['testStrIndex1'] = array();
	$_SESSION['iteration']['testStrIndex2'] = array();
	$_SESSION['iteration']['testNumIndex1'] = array();
	$_SESSION['iteration']['testNumIndex2'] = array();
}

function microtime_float() 
{
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}

/** 
 * Setup
*/
$arraySize = 100;
$iterations = 2000;
$maxTestCases = 10;

$testNumIndex = range(0, 200);
$testStrIndex = array();
for ($x = 0; $x < $arraySize; $x++) {
	$testStrIndex['foo'. $x] = $x;
}

/** 
 * Testing in_array() with user defined keys
*/
$testStrIndex1_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
	if (in_array(199, $testStrIndex)) {}
}
$testStrIndex1_end = microtime_float();

/** 
 * Testing for() loop with user defined keys
*/
$testStrIndex2_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
	for ($i = 0; $i <= $arraySize; $i++) { 
		if ($testNumIndex[$i] == $arraySize) {}
	}
}
$testStrIndex2_end = microtime_float();

/** 
 * Testing in_array() with system defined keys
*/
$testNumIndex1_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
	if (in_array(199, $testNumIndex)) {}
}
$testNumIndex1_end = microtime_float();

/** 
 * Testing for() loop with system defined keys
*/
$testNumIndex2_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
	for ($i = 0; $i <= $arraySize; $i++) { 
		if ($testNumIndex[$i] == $arraySize) {}
	}
}
$testNumIndex2_end = microtime_float();

$_SESSION['testCases']++;
$_SESSION['iteration']['testStrIndex1'][] = ($testStrIndex1_end - $testStrIndex1_start); 
$_SESSION['iteration']['testStrIndex2'][] = ($testStrIndex2_end - $testStrIndex2_start); 
$_SESSION['iteration']['testNumIndex1'][] = ($testNumIndex1_end - $testNumIndex1_start); 
$_SESSION['iteration']['testNumIndex2'][] = ($testNumIndex2_end - $testNumIndex2_start); 

if ($_SESSION['testCases'] != $maxTestCases) { 
	header('Location: /index.php?iterate=');
	sleep(1); //give processor a breath
	exit();
}

?>
<h3>TestCases Run: <?php echo $_SESSION['testCases']; ?></h3>
<fieldset>
	<legend>testStrIndex - in_array()</legend>
	Total: <?php echo (array_sum($_SESSION['iteration']['testStrIndex1']) / $_SESSION['testCases']); ?>
</fieldset>
<fieldset>
	<legend>testStrIndex - for()</legend>
	Total: <?php echo (array_sum($_SESSION['iteration']['testStrIndex2']) / $_SESSION['testCases']) ?>
</fieldset>
<fieldset>
	<legend>testNumIndex - in_array()</legend>
	Total: <?php echo (array_sum($_SESSION['iteration']['testNumIndex1']) / $_SESSION['testCases']); ?>
</fieldset>
<fieldset>
	<legend>testNumIndex - for()</legend>
	Total: <?php echo (array_sum($_SESSION['iteration']['testNumIndex2']) / $_SESSION['testCases']) ?>
</fieldset>

<?php 

if ($_SESSION['testCases'] == $maxTestCases) {
	$_SESSION['testCases'] = 0;
}

?>

Posted: Mon Aug 13, 2007 12:54 am
by Benjamin
This is what I would do.

Code: Select all

$valid = array_flip(array('jpg', 'gif', 'png', 'yadda', 'etc'));

if (isset($valid[$extension]))
{
    // it's ok
}
Not that using in_array or looping through them is going to kill the server or anything. I just write code from a performance based perspective a lot of the times.

Posted: Mon Aug 13, 2007 2:26 am
by Ollie Saunders
astions wrote:This is what I would do.

Code: Select all

$valid = array_flip(array('jpg', 'gif', 'png', 'yadda', 'etc'));

if (isset($valid[$extension]))
{
    // it's ok
}
You should only do that if you are planning on doing checks against it lots of times. array_flip() is more expensive than a single search but cheaper than several, I don't know at what point it starts to become worth it.

Posted: Mon Aug 13, 2007 2:34 am
by Benjamin
Then you can just populate the array with keys. I'm pointing out alternative solutions here.

Posted: Mon Aug 13, 2007 3:00 am
by stereofrog
The simplest (and probably fastest) would be

Code: Select all

if(preg_match('/\.(gif|jpg|jpeg|png)$/Di', $filename))....
BTW, file extension check is not sufficient from the security standpoint, you should always check what is actually uploaded i.e. the file content.