read md5 directory recursively and return duplicate files

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
furrball
Forum Newbie
Posts: 2
Joined: Mon Aug 30, 2010 7:43 pm

read md5 directory recursively and return duplicate files

Post by furrball »

Hey there, I'm trying to write a script which takes in an md5 hash, reads a directory recursively and returns files in the directory with similar md5 hashes.

I've managed to get the user input as well as the md5 hash of the files recursively, but I've got a problem with trying to compare the input value to the indexed array of md5 values.

I've tried comparing it as a string to an array, as well as from an index array to another using the in_array, diff_array, array_search and array_intersect functions but could not get it to work.

Here's what I've got so far.

Code: Select all

<?php

set_time_limit(99999);
ini_set("max_execution_time",99999); 

print "Input your md5 hash. \n";

{

$fh = fopen('php://stdin', 'r');
$last_line = false;
$message = '';
$fh_string = fgets($fh, 1024); // read the special file to get the user input from keyboard


function findDuplicateFiles($dirName){
        global $fh_string;
	$fh_array = array("$fh_string");
        $dirName=trim($dirName);
        if(empty($dirName)){ die("Fatal Error 0x01: Directory Name can NOT be empty"); }
        if(!is_dir($dirName)){ die("Fatal Error 0x02: $dir is not a valid or readable Directory"); }
        $filesArray=parseDirectory($dirName);
	$c=count($filesArray);
        for($i=0;$i<$c;$i++){
            $md5FilesArray[$i]=md5_file($filesArray[$i]);
        }
        $duplicateFilesArray=array();
	$duplicateFiles=array_values($md5FilesArray);
	$identical=array_intersect_assoc($fh_array, $md5FilesArray);
	print_r($identical);
		if($identical==1){
        		foreach($duplicateFiles as $key=>$value){
            		if($value!==1){ 
                		$names=array_values($md5FilesArray);
				$duplicate=array();
                		foreach($names as $name){
                    		$duplicate[]=$filesArray[$name];
                		}
                		$duplicateFilesArray[]=$duplicate;
				}
   	}

	}
     
        return $duplicateFilesArray;
    }    
    
        function parseDirectory($rootPath,$returnOnlyFiles=true, $seperator="/"){
        $fileArray=array();
        if (($handle = opendir($rootPath))!==false) {
            while( ($file = readdir($handle))!==false) {
                if($file !='.' && $file !='..'){
                    if (is_dir($rootPath.$seperator.$file)){
                        $array=parseDirectory($rootPath.$seperator.$file);
                        $fileArray=array_merge($array,$fileArray);
                        if($returnOnlyFiles!==true){
                            $fileArray[]=$rootPath.$seperator.$file;
                        }
                    }
                    else {
                        $fileArray[]=$rootPath.$seperator.$file;
                    }
                }
            }
        }
        return $fileArray;
    }
    
    
} 

$dir="C:\Documents and Settings\Administrator\Desktop\php-5.3.3-Win32-VC6-x86/";
$duplicateFiles=findDuplicateFiles($dir);

//Display Results
$c=count($duplicateFiles);
if($c===0){ echo "No Duplicate Files were found"; }
else{
    echo "Duplicate files found in ";
    $col=0;
    foreach($duplicateFiles as $duplicate){
        $c=count($duplicate);        
        for($a=0;$a<$c;$a++){
            $file=$duplicate[$a];            
            echo $file;            
        }    
        $col++;    
    }

  }

?>
Any form of help is really much appreciated, I'm pretty new to this so I hope you guys could point me in the right direction! Thanks!
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: read md5 directory recursively and return duplicate file

Post by requinix »

Rather than try to build a huge list of all the files, just (recursively) go through the directories looking for a matching hash. No extra arrays needed.

Code: Select all

function lookForDuplicate(hash, dir) {
	files = empty array

	for each file in dir {
		path = dir . "/" . file
		if (path is a file && md5 hash of path == hash) {
			add file to files
		} else if (path is a directory) {
			add lookForDuplicates(hash, path) to files
		}
	}
	return files
}
furrball
Forum Newbie
Posts: 2
Joined: Mon Aug 30, 2010 7:43 pm

Re: read md5 directory recursively and return duplicate file

Post by furrball »

thanks a lot tasairis, I'll try to work on that.
Post Reply