Page 1 of 1
Extension Validation - Loop vs. in_array()
Posted: Sun Aug 12, 2007 9:44 pm
by SidewinderX
I'm storing a set of valid files extensions (which are allowed to be uploaded) an an array. I've looked through the source of about 10 upload functions/classes from various sites [Zend, PHP Classes, other] that have file validation and they all seem to use a loop to check if the $_FILE['name']['extension'] is in the "AllowedExtensions" array.
Is there a reason everyone has opted to use a loop as opposed to the in_array() function?
Posted: Sun Aug 12, 2007 10:10 pm
by iknownothing
case-sensitivity in in_array() maybe?
.JPG is valid, even though the array might only contain .jpg.
Posted: Sun Aug 12, 2007 11:07 pm
by s.dot
Since I check the types of files and manually add on the extension based on the type of file, I use the in_array() function.
Posted: Sun Aug 12, 2007 11:11 pm
by RobertGonzalez
Some people may not know that there is an
in_array() function. Or perhaps their benchmarks show that a loop is faster than
in_array() (not saying it is, just saying that some benchmarks could show it that way).
Posted: Mon Aug 13, 2007 12:09 am
by John Cartwright
Had a couple minutes to write a quick benchmark. The only issue I have with this benchmark it that is assume you will iterate the entire array, however I can't think of a proper way otherwise to benchmark this..
What my benchmark did was run 2000 iterations of the different kinds of methods on a 200 row array. The benchmark would then reset itself after a shortpause and reset to take the average of 10 test cases. I used both systematically defined array keys and user defined array keys to see what kind of difference that would have as well.
testStrIndex - in_array()
Total: 0.0352233886719 seconds
testStrIndex - for()
Total: 0.170114207268 seconds
testNumIndex - in_array()
Total: 0.0718070030212 seconds
testNumIndex - for()
Total: 0.170265245438 seconds
As you can see, in_array() seems to be more efficient than looping the entire array (assuming you do not break out of it). Although even still, in_array() still has a relatively large deferential. I also changing the array size to a very small array as well as a large large array and the differential ratio was pretty much the same.
And heres the code..
Code: Select all
<?php
set_time_limit(0);
error_reporting(E_ALL);
session_start();
if (empty($_SESSION['testCases'])) {
$_SESSION['testCases'] = 0;
$_SESSION['iteration']['testStrIndex1'] = array();
$_SESSION['iteration']['testStrIndex2'] = array();
$_SESSION['iteration']['testNumIndex1'] = array();
$_SESSION['iteration']['testNumIndex2'] = array();
}
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
/**
* Setup
*/
$arraySize = 100;
$iterations = 2000;
$maxTestCases = 10;
$testNumIndex = range(0, 200);
$testStrIndex = array();
for ($x = 0; $x < $arraySize; $x++) {
$testStrIndex['foo'. $x] = $x;
}
/**
* Testing in_array() with user defined keys
*/
$testStrIndex1_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
if (in_array(199, $testStrIndex)) {}
}
$testStrIndex1_end = microtime_float();
/**
* Testing for() loop with user defined keys
*/
$testStrIndex2_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
for ($i = 0; $i <= $arraySize; $i++) {
if ($testNumIndex[$i] == $arraySize) {}
}
}
$testStrIndex2_end = microtime_float();
/**
* Testing in_array() with system defined keys
*/
$testNumIndex1_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
if (in_array(199, $testNumIndex)) {}
}
$testNumIndex1_end = microtime_float();
/**
* Testing for() loop with system defined keys
*/
$testNumIndex2_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
for ($i = 0; $i <= $arraySize; $i++) {
if ($testNumIndex[$i] == $arraySize) {}
}
}
$testNumIndex2_end = microtime_float();
$_SESSION['testCases']++;
$_SESSION['iteration']['testStrIndex1'][] = ($testStrIndex1_end - $testStrIndex1_start);
$_SESSION['iteration']['testStrIndex2'][] = ($testStrIndex2_end - $testStrIndex2_start);
$_SESSION['iteration']['testNumIndex1'][] = ($testNumIndex1_end - $testNumIndex1_start);
$_SESSION['iteration']['testNumIndex2'][] = ($testNumIndex2_end - $testNumIndex2_start);
if ($_SESSION['testCases'] != $maxTestCases) {
header('Location: /index.php?iterate=');
sleep(1); //give processor a breath
exit();
}
?>
<h3>TestCases Run: <?php echo $_SESSION['testCases']; ?></h3>
<fieldset>
<legend>testStrIndex - in_array()</legend>
Total: <?php echo (array_sum($_SESSION['iteration']['testStrIndex1']) / $_SESSION['testCases']); ?>
</fieldset>
<fieldset>
<legend>testStrIndex - for()</legend>
Total: <?php echo (array_sum($_SESSION['iteration']['testStrIndex2']) / $_SESSION['testCases']) ?>
</fieldset>
<fieldset>
<legend>testNumIndex - in_array()</legend>
Total: <?php echo (array_sum($_SESSION['iteration']['testNumIndex1']) / $_SESSION['testCases']); ?>
</fieldset>
<fieldset>
<legend>testNumIndex - for()</legend>
Total: <?php echo (array_sum($_SESSION['iteration']['testNumIndex2']) / $_SESSION['testCases']) ?>
</fieldset>
<?php
if ($_SESSION['testCases'] == $maxTestCases) {
$_SESSION['testCases'] = 0;
}
?>
Posted: Mon Aug 13, 2007 12:54 am
by Benjamin
This is what I would do.
Code: Select all
$valid = array_flip(array('jpg', 'gif', 'png', 'yadda', 'etc'));
if (isset($valid[$extension]))
{
// it's ok
}
Not that using in_array or looping through them is going to kill the server or anything. I just write code from a performance based perspective a lot of the times.
Posted: Mon Aug 13, 2007 2:26 am
by Ollie Saunders
astions wrote:This is what I would do.
Code: Select all
$valid = array_flip(array('jpg', 'gif', 'png', 'yadda', 'etc'));
if (isset($valid[$extension]))
{
// it's ok
}
You should only do that if you are planning on doing checks against it lots of times. array_flip() is more expensive than a single search but cheaper than several, I don't know at what point it starts to become worth it.
Posted: Mon Aug 13, 2007 2:34 am
by Benjamin
Then you can just populate the array with keys. I'm pointing out alternative solutions here.
Posted: Mon Aug 13, 2007 3:00 am
by stereofrog
The simplest (and probably fastest) would be
Code: Select all
if(preg_match('/\.(gif|jpg|jpeg|png)$/Di', $filename))....
BTW, file extension check is not sufficient from the security standpoint, you should always check what is actually uploaded i.e. the file content.