Thanks for point out the post with a plain text detection script - I have yet to try it though.
I have started on a script to check file types.
Code: Select all
$hex_file_idents = array(
"zip"=>"50 4B 03 04",
"pdf"=>"25 50 44 46 2D 31 2E",
"fla"=>"D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00",
"swf"=>"46 57 53",
"mp3"=>"FF FB 30",
"mp3(2nd identifier test)"=>"49 44 33",
"wmv"=>"30 26 B2 75 8E 66 CF 11 A6 D9 00 AA 00 62 CE 6C",
"avi"=>"52 49 46 46",
"mpg1"=>"00 00 01 BA 21 00 01",
"midi"=>"4D 54 68 64",
"eps"=>"C5 D0 D3 C6"
);
$content = file_get_contents($uploaded_file); // open the + read contents
$hex = bin2hex($content); // convert it to hex
foreach($hex_file_idents as $file_ext => $file_ident){ // loop through out arrary of identifiers
$ident = strtolower(str_replace(" ", "", $file_ident)); // lower the case, remove spaces from our array entry
$ident_length = strlen($ident); // now get the length of our array entry
$extract_file_ident = strtolower(substr($hex, 0, $ident_length)); // extract the same length from out open file + lower case
if($extract_file_ident == $ident){ // if there's a match, output the file type
echo "Your file is ".$file_ext;
break;
}
}
At the moment, I have a few issues.
1. to get a match I am removing spaces and lowering the case. How could I do this efficiently for an entire uploaded file? Having said that, I haven't needed to do that yet as all uploaded files seem to have no spaces and be lower case.
2. I'm no good with regex, but would a preg_match search be more efficient than extracting a substring an comparing it?
3. my method might find a match if one file type identifier contains the hex value of a shorter identifier for another file type i.e.
.fla = D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
.doc = D0 CF 11 E0 A1 B1 1A E1
searching for a .doc returns a .fla