Extension Validation - Loop vs. in_array()
Moderator: General Moderators
-
SidewinderX
- Forum Contributor
- Posts: 407
- Joined: Fri Jul 16, 2004 9:04 pm
- Location: NY
Extension Validation - Loop vs. in_array()
I'm storing a set of valid files extensions (which are allowed to be uploaded) an an array. I've looked through the source of about 10 upload functions/classes from various sites [Zend, PHP Classes, other] that have file validation and they all seem to use a loop to check if the $_FILE['name']['extension'] is in the "AllowedExtensions" array.
Is there a reason everyone has opted to use a loop as opposed to the in_array() function?
Is there a reason everyone has opted to use a loop as opposed to the in_array() function?
- iknownothing
- Forum Contributor
- Posts: 337
- Joined: Sun Dec 17, 2006 11:53 pm
- Location: Sunshine Coast, Australia
Since I check the types of files and manually add on the extension based on the type of file, I use the in_array() function.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
Some people may not know that there is an in_array() function. Or perhaps their benchmarks show that a loop is faster than in_array() (not saying it is, just saying that some benchmarks could show it that way).
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
Had a couple minutes to write a quick benchmark. The only issue I have with this benchmark it that is assume you will iterate the entire array, however I can't think of a proper way otherwise to benchmark this..
What my benchmark did was run 2000 iterations of the different kinds of methods on a 200 row array. The benchmark would then reset itself after a shortpause and reset to take the average of 10 test cases. I used both systematically defined array keys and user defined array keys to see what kind of difference that would have as well.
And heres the code..
What my benchmark did was run 2000 iterations of the different kinds of methods on a 200 row array. The benchmark would then reset itself after a shortpause and reset to take the average of 10 test cases. I used both systematically defined array keys and user defined array keys to see what kind of difference that would have as well.
As you can see, in_array() seems to be more efficient than looping the entire array (assuming you do not break out of it). Although even still, in_array() still has a relatively large deferential. I also changing the array size to a very small array as well as a large large array and the differential ratio was pretty much the same.testStrIndex - in_array()
Total: 0.0352233886719 seconds
testStrIndex - for()
Total: 0.170114207268 seconds
testNumIndex - in_array()
Total: 0.0718070030212 seconds
testNumIndex - for()
Total: 0.170265245438 seconds
And heres the code..
Code: Select all
<?php
set_time_limit(0);
error_reporting(E_ALL);
session_start();
if (empty($_SESSION['testCases'])) {
$_SESSION['testCases'] = 0;
$_SESSION['iteration']['testStrIndex1'] = array();
$_SESSION['iteration']['testStrIndex2'] = array();
$_SESSION['iteration']['testNumIndex1'] = array();
$_SESSION['iteration']['testNumIndex2'] = array();
}
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
/**
* Setup
*/
$arraySize = 100;
$iterations = 2000;
$maxTestCases = 10;
$testNumIndex = range(0, 200);
$testStrIndex = array();
for ($x = 0; $x < $arraySize; $x++) {
$testStrIndex['foo'. $x] = $x;
}
/**
* Testing in_array() with user defined keys
*/
$testStrIndex1_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
if (in_array(199, $testStrIndex)) {}
}
$testStrIndex1_end = microtime_float();
/**
* Testing for() loop with user defined keys
*/
$testStrIndex2_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
for ($i = 0; $i <= $arraySize; $i++) {
if ($testNumIndex[$i] == $arraySize) {}
}
}
$testStrIndex2_end = microtime_float();
/**
* Testing in_array() with system defined keys
*/
$testNumIndex1_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
if (in_array(199, $testNumIndex)) {}
}
$testNumIndex1_end = microtime_float();
/**
* Testing for() loop with system defined keys
*/
$testNumIndex2_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
for ($i = 0; $i <= $arraySize; $i++) {
if ($testNumIndex[$i] == $arraySize) {}
}
}
$testNumIndex2_end = microtime_float();
$_SESSION['testCases']++;
$_SESSION['iteration']['testStrIndex1'][] = ($testStrIndex1_end - $testStrIndex1_start);
$_SESSION['iteration']['testStrIndex2'][] = ($testStrIndex2_end - $testStrIndex2_start);
$_SESSION['iteration']['testNumIndex1'][] = ($testNumIndex1_end - $testNumIndex1_start);
$_SESSION['iteration']['testNumIndex2'][] = ($testNumIndex2_end - $testNumIndex2_start);
if ($_SESSION['testCases'] != $maxTestCases) {
header('Location: /index.php?iterate=');
sleep(1); //give processor a breath
exit();
}
?>
<h3>TestCases Run: <?php echo $_SESSION['testCases']; ?></h3>
<fieldset>
<legend>testStrIndex - in_array()</legend>
Total: <?php echo (array_sum($_SESSION['iteration']['testStrIndex1']) / $_SESSION['testCases']); ?>
</fieldset>
<fieldset>
<legend>testStrIndex - for()</legend>
Total: <?php echo (array_sum($_SESSION['iteration']['testStrIndex2']) / $_SESSION['testCases']) ?>
</fieldset>
<fieldset>
<legend>testNumIndex - in_array()</legend>
Total: <?php echo (array_sum($_SESSION['iteration']['testNumIndex1']) / $_SESSION['testCases']); ?>
</fieldset>
<fieldset>
<legend>testNumIndex - for()</legend>
Total: <?php echo (array_sum($_SESSION['iteration']['testNumIndex2']) / $_SESSION['testCases']) ?>
</fieldset>
<?php
if ($_SESSION['testCases'] == $maxTestCases) {
$_SESSION['testCases'] = 0;
}
?>This is what I would do.
Not that using in_array or looping through them is going to kill the server or anything. I just write code from a performance based perspective a lot of the times.
Code: Select all
$valid = array_flip(array('jpg', 'gif', 'png', 'yadda', 'etc'));
if (isset($valid[$extension]))
{
// it's ok
}- Ollie Saunders
- DevNet Master
- Posts: 3179
- Joined: Tue May 24, 2005 6:01 pm
- Location: UK
You should only do that if you are planning on doing checks against it lots of times. array_flip() is more expensive than a single search but cheaper than several, I don't know at what point it starts to become worth it.astions wrote:This is what I would do.
Code: Select all
$valid = array_flip(array('jpg', 'gif', 'png', 'yadda', 'etc')); if (isset($valid[$extension])) { // it's ok }
- stereofrog
- Forum Contributor
- Posts: 386
- Joined: Mon Dec 04, 2006 6:10 am
The simplest (and probably fastest) would be
BTW, file extension check is not sufficient from the security standpoint, you should always check what is actually uploaded i.e. the file content.
Code: Select all
if(preg_match('/\.(gif|jpg|jpeg|png)$/Di', $filename))....