Extension Validation - Loop vs. in_array()

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Extension Validation - Loop vs. in_array()

Post by SidewinderX »

I'm storing a set of valid files extensions (which are allowed to be uploaded) an an array. I've looked through the source of about 10 upload functions/classes from various sites [Zend, PHP Classes, other] that have file validation and they all seem to use a loop to check if the $_FILE['name']['extension'] is in the "AllowedExtensions" array.

Is there a reason everyone has opted to use a loop as opposed to the in_array() function?
User avatar
iknownothing
Forum Contributor
Posts: 337
Joined: Sun Dec 17, 2006 11:53 pm
Location: Sunshine Coast, Australia

Post by iknownothing »

case-sensitivity in in_array() maybe?

.JPG is valid, even though the array might only contain .jpg.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Since I check the types of files and manually add on the extension based on the type of file, I use the in_array() function.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

Some people may not know that there is an in_array() function. Or perhaps their benchmarks show that a loop is faster than in_array() (not saying it is, just saying that some benchmarks could show it that way).
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Had a couple minutes to write a quick benchmark. The only issue I have with this benchmark it that is assume you will iterate the entire array, however I can't think of a proper way otherwise to benchmark this..

What my benchmark did was run 2000 iterations of the different kinds of methods on a 200 row array. The benchmark would then reset itself after a shortpause and reset to take the average of 10 test cases. I used both systematically defined array keys and user defined array keys to see what kind of difference that would have as well.
testStrIndex - in_array()
Total: 0.0352233886719 seconds

testStrIndex - for()
Total: 0.170114207268 seconds

testNumIndex - in_array()
Total: 0.0718070030212 seconds

testNumIndex - for()
Total: 0.170265245438 seconds
As you can see, in_array() seems to be more efficient than looping the entire array (assuming you do not break out of it). Although even still, in_array() still has a relatively large deferential. I also changing the array size to a very small array as well as a large large array and the differential ratio was pretty much the same.

And heres the code..

Code: Select all

<?php
set_time_limit(0);
error_reporting(E_ALL);
session_start();

if (empty($_SESSION['testCases'])) {
	$_SESSION['testCases'] = 0;
	$_SESSION['iteration']['testStrIndex1'] = array();
	$_SESSION['iteration']['testStrIndex2'] = array();
	$_SESSION['iteration']['testNumIndex1'] = array();
	$_SESSION['iteration']['testNumIndex2'] = array();
}

function microtime_float() 
{
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}

/** 
 * Setup
*/
$arraySize = 100;
$iterations = 2000;
$maxTestCases = 10;

$testNumIndex = range(0, 200);
$testStrIndex = array();
for ($x = 0; $x < $arraySize; $x++) {
	$testStrIndex['foo'. $x] = $x;
}

/** 
 * Testing in_array() with user defined keys
*/
$testStrIndex1_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
	if (in_array(199, $testStrIndex)) {}
}
$testStrIndex1_end = microtime_float();

/** 
 * Testing for() loop with user defined keys
*/
$testStrIndex2_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
	for ($i = 0; $i <= $arraySize; $i++) { 
		if ($testNumIndex[$i] == $arraySize) {}
	}
}
$testStrIndex2_end = microtime_float();

/** 
 * Testing in_array() with system defined keys
*/
$testNumIndex1_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
	if (in_array(199, $testNumIndex)) {}
}
$testNumIndex1_end = microtime_float();

/** 
 * Testing for() loop with system defined keys
*/
$testNumIndex2_start = microtime_float();
for ($x = 0; $x < $iterations; $x++) {
	for ($i = 0; $i <= $arraySize; $i++) { 
		if ($testNumIndex[$i] == $arraySize) {}
	}
}
$testNumIndex2_end = microtime_float();

$_SESSION['testCases']++;
$_SESSION['iteration']['testStrIndex1'][] = ($testStrIndex1_end - $testStrIndex1_start); 
$_SESSION['iteration']['testStrIndex2'][] = ($testStrIndex2_end - $testStrIndex2_start); 
$_SESSION['iteration']['testNumIndex1'][] = ($testNumIndex1_end - $testNumIndex1_start); 
$_SESSION['iteration']['testNumIndex2'][] = ($testNumIndex2_end - $testNumIndex2_start); 

if ($_SESSION['testCases'] != $maxTestCases) { 
	header('Location: /index.php?iterate=');
	sleep(1); //give processor a breath
	exit();
}

?>
<h3>TestCases Run: <?php echo $_SESSION['testCases']; ?></h3>
<fieldset>
	<legend>testStrIndex - in_array()</legend>
	Total: <?php echo (array_sum($_SESSION['iteration']['testStrIndex1']) / $_SESSION['testCases']); ?>
</fieldset>
<fieldset>
	<legend>testStrIndex - for()</legend>
	Total: <?php echo (array_sum($_SESSION['iteration']['testStrIndex2']) / $_SESSION['testCases']) ?>
</fieldset>
<fieldset>
	<legend>testNumIndex - in_array()</legend>
	Total: <?php echo (array_sum($_SESSION['iteration']['testNumIndex1']) / $_SESSION['testCases']); ?>
</fieldset>
<fieldset>
	<legend>testNumIndex - for()</legend>
	Total: <?php echo (array_sum($_SESSION['iteration']['testNumIndex2']) / $_SESSION['testCases']) ?>
</fieldset>

<?php 

if ($_SESSION['testCases'] == $maxTestCases) {
	$_SESSION['testCases'] = 0;
}

?>
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

This is what I would do.

Code: Select all

$valid = array_flip(array('jpg', 'gif', 'png', 'yadda', 'etc'));

if (isset($valid[$extension]))
{
    // it's ok
}
Not that using in_array or looping through them is going to kill the server or anything. I just write code from a performance based perspective a lot of the times.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

astions wrote:This is what I would do.

Code: Select all

$valid = array_flip(array('jpg', 'gif', 'png', 'yadda', 'etc'));

if (isset($valid[$extension]))
{
    // it's ok
}
You should only do that if you are planning on doing checks against it lots of times. array_flip() is more expensive than a single search but cheaper than several, I don't know at what point it starts to become worth it.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

Then you can just populate the array with keys. I'm pointing out alternative solutions here.
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

The simplest (and probably fastest) would be

Code: Select all

if(preg_match('/\.(gif|jpg|jpeg|png)$/Di', $filename))....
BTW, file extension check is not sufficient from the security standpoint, you should always check what is actually uploaded i.e. the file content.
Post Reply