Page 1 of 1
Allow limited MIME types for upload
Posted: Sat Mar 11, 2006 11:51 am
by ed209
Hi,
I want to be able to upload files, but only certain kinds. It's easy enough to check for image files with
Code: Select all
$uploaded_file = $_FILES['formfile']['tmp_name'];
$file_information = getimagesize($uploaded_file);
$file_information['mime'];
but what happens when someone wants to upload a PDF or ZIP. I don't want to rely on the extension as someone might upload file.exe.jpg or upload a file with no extension.
Also, my web hosts don't allow mime_content_type ( ), so I'm not sure how to get the actual file type. Anyone else encountered this problem?
Posted: Sat Mar 11, 2006 12:44 pm
by feyd
Analyze the file. Most file types have identifying marks that tell a program they are who they say they are. Both PDF and Zip files have signatures: (albeit really basic check, it's a starting point)
http://filext.com/detaillist.php?extdetail=pdf
http://filext.com/detaillist.php?extdetail=zip
Posted: Sat Mar 11, 2006 3:25 pm
by ed209
The bit I'm not sure of is how to access the identifing marks. The two obvious options I would have gone for are:
1. Check the uploaded file extension. In my experience - unreliable. Also, quote from php.net
$_FILES['userfile']['type']
The mime type of the file, if the browser provided this information. An example would be "image/gif". This mime type is however not checked on the PHP side and therefore don't take its value for granted.
2. mime_content_type() - but it's not accessible on my server
Is there any other way I can get the file type? All I want is to check a file type against a list of allowed file types array("application/msword", "application/pdf", "image/png", "image/gif"). Would checking if it's a .zip by opening it be too cumbersome? Not sure of the PDF or MS Word equivalent.
http://uk2.php.net/manual/en/function.zip-open.php
thanks for the help.
Posted: Sat Mar 11, 2006 3:47 pm
by feyd
mime_content_type() isn't reliable either as it uses magic.mime which just maps extensions to mime-types.
To analyze the file,
fopen() it ($_FILES['userfile']['tmp_name']) and read however many bytes are necessary to determine the type based on the identifying marks given on filext.
Posted: Sat Mar 11, 2006 3:53 pm
by feyd
offhand, I remember a previous mentioning of a unix command called "file" which interrogated the file in some fashion, but I dont' remember how well it does that.
http://node1.yo-linux.com/cgi-bin/man2h ... nd=file(1)
Posted: Sat Mar 11, 2006 4:00 pm
by ed209
I was just looking at that on the php.net site, but I got scared off as I saw this post (
http://uk.php.net/manual/en/function.file.php#48420) saying the whole file has to be loaded before being able to do anything ( using file() that is) - I thought it might get a bit cumbersome . I'll take a look at the fopen approach.
Posted: Sat Mar 11, 2006 4:06 pm
by ed209
small world ... contributions to 'file' development are from a guy just down the road from me!
Posted: Sat Mar 11, 2006 4:26 pm
by feyd
file() is different than the unix command "file."
file() loads the file by lines into an array, while "file" attempts to return file-type information
Posted: Sat Mar 11, 2006 4:58 pm
by ed209
I got that, bad timing on the posts I think!
I've set up a check using file_get_contents(). Reading the first few characters and checking them against the ASCII identifying characters you sent in the link.
It actually seems pretty sraight forward, though I've only looked at uploading .zip and .pdf.
Code: Select all
$uploaded_file = $_FILES['formfile']['tmp_name'];
$content = file_get_contents($uploaded_file);
// if it's a PDF it will start with %PDF-1
// if it's a ZIP it will start with PK
I don't think the above method will work for plain text files thought. Also tried it with a flash file - but they only have a hex identifier for it. This would mean converting it with bin2hex($content); - then checking it against the identifiers. I guess you'd have to have a library of identifiers - and check the file in ASCII and HEX to find the file type. That'll be some function!
Posted: Sat Mar 11, 2006 5:09 pm
by feyd
Just need either hex or binary, not both.
For identifying potential text files and some other files in more detail than just a signature:
Posted: Sun Mar 12, 2006 3:36 am
by ed209
Thanks for point out the post with a plain text detection script - I have yet to try it though.
I have started on a script to check file types.
Code: Select all
$hex_file_idents = array(
"zip"=>"50 4B 03 04",
"pdf"=>"25 50 44 46 2D 31 2E",
"fla"=>"D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00",
"swf"=>"46 57 53",
"mp3"=>"FF FB 30",
"mp3(2nd identifier test)"=>"49 44 33",
"wmv"=>"30 26 B2 75 8E 66 CF 11 A6 D9 00 AA 00 62 CE 6C",
"avi"=>"52 49 46 46",
"mpg1"=>"00 00 01 BA 21 00 01",
"midi"=>"4D 54 68 64",
"eps"=>"C5 D0 D3 C6"
);
$content = file_get_contents($uploaded_file); // open the + read contents
$hex = bin2hex($content); // convert it to hex
foreach($hex_file_idents as $file_ext => $file_ident){ // loop through out arrary of identifiers
$ident = strtolower(str_replace(" ", "", $file_ident)); // lower the case, remove spaces from our array entry
$ident_length = strlen($ident); // now get the length of our array entry
$extract_file_ident = strtolower(substr($hex, 0, $ident_length)); // extract the same length from out open file + lower case
if($extract_file_ident == $ident){ // if there's a match, output the file type
echo "Your file is ".$file_ext;
break;
}
}
At the moment, I have a few issues.
1. to get a match I am removing spaces and lowering the case. How could I do this efficiently for an entire uploaded file? Having said that, I haven't needed to do that yet as all uploaded files seem to have no spaces and be lower case.
2. I'm no good with regex, but would a preg_match search be more efficient than extracting a substring an comparing it?
3. my method might find a match if one file type identifier contains the hex value of a shorter identifier for another file type i.e.
.fla = D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
.doc = D0 CF 11 E0 A1 B1 1A E1
searching for a .doc returns a .fla
Posted: Sun Mar 12, 2006 9:06 am
by feyd
Try using the following. You'll need to integrate it into your existing code.
Code: Select all
function condense($value)
{
return pack('H*',str_replace(' ','',$value));
}
function get_signature($file,$hex_idents)
{
$fp = fopen($file,'rb');
if(!$fp)
{
return null;
}
$bin_idents = array_map('condense', $hex_idents);
$size = array_map('strlen',$bin_file_idents);
$read = max($size);
$data = fread($fp,$read);
fclose($file);
foreach($bin_idents as $type => $signature)
{
$found = (substr($data, 0, strlen($signature) === $signature);
if($found)
{
break;
}
}
return ($found ? $type : false);
}
Posted: Sun Mar 12, 2006 10:03 am
by ed209
Thanks.
Code: Select all
// location of the uploaded file
$uploaded_file = $_FILES['formfile']['tmp_name'];
// hex values for file types
$hex_file_idents = array( ".pdf"=>"25 50 44 46 2D 31 2E", ".doc"=>"D0 CF 11 E0 A1 B1 1A E1", ".zip"=>"50 4B 03 04");
function condense($value){
return pack('H*',str_replace(' ','',$value));
}
function get_signature($file,$hex_idents){
$fp=fopen($file, 'rb');
if(!$fp){
return null;
}
$bin_idents = array_map('condense', $hex_idents);
$size = array_map('strlen', $bin_idents);
$read = max($size);
$data = fread($fp, $read);
fclose($fp);
foreach($bin_idents as $type => $signature){
$found = (substr($data, 0, strlen($signature) === $signature));
echo '$data = '.$data.' & $signature = '.$signature."<br />"; // for testing
if($found){
break;
}
}
return($found ? $type : false);
}
// check the file types
$file_extension = get_signature($uploaded_file, $hex_file_idents);
/* RETURNS
$data = %PDF-1.3 & $signature = %PDF-1. // this should be a match
$data = %PDF-1.3 & $signature = ÐÏࡱá
$data = %PDF-1.3 & $signature = PK
*/
Looks like the $read value is too long. I haven't seen some of the functions you've used before so I'll need a bit more time to get it working.
I have a 'good enough' file type detection working for now, I'll update this post once I have a decent enough function for doing this though. Thanks for your time.
Posted: Sun Mar 12, 2006 10:09 am
by feyd
oops, I missed a paren in the right place...
Code: Select all
function condense($value)
{
return pack('H*',str_replace(' ','',$value));
}
function get_signature($file,$hex_idents)
{
$fp = fopen($file,'rb');
if(!$fp)
{
return null;
}
$bin_idents = array_map('condense', $hex_idents);
$size = array_map('strlen',$bin_file_idents);
$read = max($size);
$data = fread($fp,$read);
fclose($file);
foreach($bin_idents as $type => $signature)
{
$found = (substr($data, 0, strlen($signature)) === $signature);
if($found)
{
break;
}
}
return ($found ? $type : false);
}
Posted: Sun Mar 12, 2006 10:22 am
by ed209
works! (plus a couple of other tweaks to var names).
I will update once I have tested it out with various file formats.
thanks.