Page 1 of 2

Scan through directories with wildcards

Posted: Mon Oct 30, 2006 11:38 pm
by John Cartwright
PHP 5 Version

A class used to scan through files with a pattern specified by user. Read class comments for further details.

Code: Select all

<?php
	/**
	 * class scanfiles (php5)
	 *
	 * Object used to scan through a directory matching filesnames
	 *
	 * Features :
	 *  - case insentive (by default)
	 *  - can pass array of patterns to match multiple filenames
	 *  - checks whether directory exists
	 *
	 * Usage :
	 *  - Array of supplied patterns
	 *		 	$files = new ScanFiles(array('index.html', 'config.php'));
	 * 	 	print_r($files->search());
	 *  - Array of supplied patterns with supplied directory
	 *			$files = new ScanFiles(array('index.html', 'config.php'));
	 * 	 	print_r($files->search());
	 *  - Array of supplied patterns with wildcard
	 *			$files = new ScanFiles(array('index.*', 'ind*.php'));
	 *			print_r($files->search());	 
	 *  - String supplied pattern with case sensitivity
	 *			$files = new ScanFiles('index.php', '', true);
	 *			print_r($files->search());	 	 
	 *
	 * Feel free to modify and use this code at will
	 *
	 * Created by Jcart at http://devnetwork.net
         * Special thanks to Feyd for helping with regex
	*/ 
	class scanFiles
	{
		protected $pattern, $dir, $extension;
		/**
		 * Constructor
		 *
		 * Checks for invalid values and assigns default values
		 *
		 * @param {mixed} $pattern
		 * @param {string} $dir
		 * @param {bool} $sensitive		 
		*/ 
		public function __construct($pattern, $dir = '', $sensitive = false) 
		{		
			$this->directory = $this->getDir($dir);	
			$this->sensitive = $this->getSensitive($sensitive);
			$this->pattern = $this->getPattern($pattern);			
		}
		/**
		 * Search
		 *
		 * Perform search of directory
		 *
		 * @ returns {array}
		*/ 
		public function search() 
		{
			$stack = array();
			foreach (glob($this->directory .'*') as $file) {
				$file = basename($file);
				if (preg_match($this->pattern, $file) && !is_dir($this->directory.$file)) {
					array_push($stack, $file);
				}
			}
			
			return $stack;
		}
		/**
		 * getPattern
		 *
		 * Format pattern into regular expression
		 *
		 * @param  {string} $pattern
		 * @returns {string}
		*/ 		
		protected function getPattern($pattern) 
		{		
			if (is_array($pattern) && count($pattern))  {
				$pattern = implode('|', array_map(create_function('$a', 'return preg_quote($a, \'#\');'), $pattern));
			} elseif (!is_string($pattern)) {
				throw new Exception('Pattern must be string or array');
			}
		
			$pattern = '#^(?:' . $pattern .')$#';
			
			if (!$this->sensitive) {
				$pattern .= 'i';
			}
				
			$pattern = str_replace('\*', '*', $pattern);
			$pattern = str_replace('*', '.*?', $pattern);
			
			return $pattern;
		}					
		/**
		 * getSensitive
		 *
		 * Sets sensitivity of pattern matching
		 * 
		 * @param {bool} $sensitive
		 * @returns {bool}
		*/    
		protected function getSensitive($sensitive) 
		{
			if (!is_bool($sensitive)) {	
				throw new Exception('Sensitivity must be boolean');
			}
			
			return $sensitive;
		}
		/**
		 * getDir
		 *
		 * If a directory was supplied we want to make sure it
		 * exists. Also checks whether to add an additional
		 * directory seperator
		 *
		 * @param {string} $dir
		 * @returns {string}
		*/  
		protected function getDir($dir) 
		{
			if (!empty($dir) && !file_exists($dir)) {
				throw new Exception('Directory "'. $dir .'" cannot be found');
			}

			return (substr($dir, -1, 1) == DIRECTORY_SEPARATOR ? $dir : $dir . DIRECTORY_SEPARATOR);
		}
	}
	
?>

Code: Select all

echo '<pre>';

        echo '<h3>Listing all available files in directory</h3>';
	echo '<pre>';
	print_r(glob('/*'));

	echo '<h3>Searching for array of matches - Expect Pass</h3>';
	$files = new ScanFiles(array('config.sys', 'boot.bak'));
	print_r($files->search());	
	
	echo '<h3>Searching for array of matches case sensitive - Expect Empty</h3>';
	$files = new ScanFiles(array('config.sys', 'boot.bak'), '', true);
	print_r($files->search());		
	
	echo '<h3>Searching for array of matches case sensitive - Expect Pass</h3>';
	$files = new ScanFiles(array('CONFIG.SYS', 'BOOT.BAK'), '', true);
	print_r($files->search());			

	echo '<h3>Searching for array of matches case sensitive with wildcard - Expect Pass</h3>';
	$files = new ScanFiles(array('CONFIG*', 'B*.BAK'), '', true);
	print_r($files->search());		

	echo '<h3>Searching for string matches with wildcard- Expect Pass</h3>';
	$files = new ScanFiles('*fig*');
	print_r($files->search());			
	
	#echo '<h3>Searching in non-existant directory - Expect Fail</h3>';
	#$files = new ScanFiles('config', '/dir/does/not/exists/');
	#print_r($files->search());
Because you guys couldn't see what files I was matching against, here is a portion of the list

Code: Select all

Array
(
    [0] => /ATI
    [1] => /AUTOEXEC.BAT
    [2] => /Aplpications
    [3] => /Applications
    [4] => /BOOT.BAK
    [5] => /CONFIG.SYS
)

Posted: Tue Oct 31, 2006 12:48 am
by Mr Tech
Hey mate... I'm using the PHP 4 version and I'm getting this error:


Parse error: parse error, unexpected T_STRING, expecting T_OLD_FUNCTION or T_FUNCTION or T_VAR or '}' in W:\www\test.php on line 49

Posted: Tue Oct 31, 2006 1:31 am
by jmut
php 4 has one public that has to be removed. the exception too.

Posted: Tue Oct 31, 2006 2:05 am
by Chris Corbyn
I've just removed the "public" keyword and the thrown exception.

Posted: Tue Oct 31, 2006 3:20 am
by Jenk
Perhaps allowing the user/dev to specify the full pattern would be better?

I'm just thinking of instances where they might not want to find a wildcard match, or want to specify beginning/end values only.

And force type (array) the glob return val maybe?

Also on a semantic point - when throwing exceptions, you should try{}catch them, even if you are just re-throwing it.

e.g.:

Code: Select all

class Foo
{
    public fooBar()
    {
        try
        {
            $this->bar();
        }
        catch(Exception $e)
        {
            throw $e;
        }
    }

    protected bar()
    {
        throw new Exception('Blah');
    }
}

Posted: Tue Oct 31, 2006 4:19 am
by Chris Corbyn
Is it just me or is that try/catch completely useless? If I declare a method to throw an exception I wouldn't bother catching it simply to throw all over again. Why catch it just throw away again? The call to the method which may tigger such an exception should catch it however. Java's nice at forcing such practices.

Posted: Tue Oct 31, 2006 4:32 am
by Jenk
It's a preference thing primarily, but is also readability in a way - upon scanning through, you don't see that it will throw an exception until you burrow down into the method which actually throws. So when re-throwing you can see early on which type of exception needs catching at the higher level.

In this case it's not much difference but in some cases it can be, and with PHP just spitting "uncaught exception" when the exception occurs, not as a warning that it will be uncaught, they can be overlooked.

Also as you say, Java enforces this behaviour and that's where I get it from. :)

Posted: Tue Oct 31, 2006 7:11 am
by Chris Corbyn
Jenk wrote:It's a preference thing primarily, but is also readability in a way - upon scanning through, you don't see that it will throw an exception until you burrow down into the method which actually throws. So when re-throwing you can see early on which type of exception needs catching at the higher level.

In this case it's not much difference but in some cases it can be, and with PHP just spitting "uncaught exception" when the exception occurs, not as a warning that it will be uncaught, they can be overlooked.

Also as you say, Java enforces this behaviour and that's where I get it from. :)
True, that's why PHPDoc has @throws tags :)

It's also why Java has the "throws" declaration following methods which throw exceptions. So I guess in PHP, if you're putting it in the actual code it does make things a bit more obvious.

Code: Select all

public class MyClass
{
    public void myMethod(byte[] input) throws MyException
    {
        //No need to catch it and throw it since it's obvious from the above
        String something = com.somepackage.SomeClass.someMethodWhichThrowsException(input);
        this.someProperty = something;
    }
}
It would have been quite nice to see that go into PHP6 but since it's so loosely typed it won't ever be wanted by many I guess.

Posted: Tue Oct 31, 2006 8:05 am
by xpgeek
It is class for one line of code.

It is not simplier?

Code: Select all

if ( false === ($res = glob("*.txt", GLOB_MARK|ANOTHER_FLAG)) )
{
    throw Exception(...);
}
//read $res

Posted: Tue Oct 31, 2006 8:42 am
by John Cartwright
xpgeek wrote:It is class for one line of code.

It is not simplier?

Code: Select all

if ( false === ($res = glob("*.txt", GLOB_MARK|ANOTHER_FLAG)) )
{
    throw Exception(...);
}
//read $res
Welcome to the the world of object-oriented code. Sometimes objects can simply be an interface of a simple function, adding a little bit of functionality.

There are a couple reasons why I made this flex, mostly because it is an abstraction with flexiblity of multiple keywords, (searching before the keyword and after the keyword), as well as easily defining a file extension, checking to make sure where your searching for actually exists.. was hoping to add case-insensitivity, but that would require using something other than glob()

Posted: Tue Oct 31, 2006 9:50 am
by Jenk
You could integrate http://us2.php.net/manual/en/function.fnmatch.php or just plain old preg_match().

Code: Select all

foreach (glob($dir . '/*') as $file)
{
    if (preg_match('/' . $pattern . '/i', $file)) $files[] = $file;
}

Posted: Tue Oct 31, 2006 12:13 pm
by John Cartwright
Jenk wrote:You could integrate http://us2.php.net/manual/en/function.fnmatch.php or just plain old preg_match().

Code: Select all

foreach (glob($dir . '/*') as $file)
{
    if (preg_match('/' . $pattern . '/i', $file)) $files[] = $file;
}
I was trying to keep a focus on performance for this class, I don't like the idea of looping through every single file see if it matches (again). Glob already does this internally, so I don't want to re-iterate the array once again. I don't see case-insensitivity much of an issue, and I could live without it.

Just for the sake of curiosity, how would I modify this to support matching for multiple keywords at once? My regex is very limited. Using glob() we can use GLOB_BRACE.



Thanks for all the input everyone.

Posted: Tue Oct 31, 2006 12:34 pm
by John Cartwright
Jenk wrote: I'm just thinking of instances where they might not want to find a wildcard match, or want to specify beginning/end values only.

And force type (array) the glob return val maybe?
Sorry I had missed these comments, good ideas. Going to add this later today if I get a chance, or feel free to take a shot at it yourself :wink:

Posted: Tue Oct 31, 2006 4:48 pm
by Jenk
because you are using GLOB_BRACE, you can simply remove the wildcard additions. and extensions and allow the developer to structure them like so (or leave extension in, and just append it without the wildcard):

Code: Select all

$patterns = array
(
    'index*',
    'homepage.html',
    '*.php'
);

$files = new scanFiles($patterns);

/**
 * final pattern string will appear as:
 * "{index*,homepage.html,*.php}"
 * which is valid.
 */

Posted: Tue Oct 31, 2006 10:17 pm
by John Cartwright
I've absorbed your suggestions, please take a look at the first post for the revamped code.