Allow limited MIME types for upload

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Allow limited MIME types for upload

Post by ed209 »

Hi,

I want to be able to upload files, but only certain kinds. It's easy enough to check for image files with

Code: Select all

$uploaded_file = $_FILES['formfile']['tmp_name'];

$file_information = getimagesize($uploaded_file);

$file_information['mime'];
but what happens when someone wants to upload a PDF or ZIP. I don't want to rely on the extension as someone might upload file.exe.jpg or upload a file with no extension.

Also, my web hosts don't allow mime_content_type ( ), so I'm not sure how to get the actual file type. Anyone else encountered this problem?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Analyze the file. Most file types have identifying marks that tell a program they are who they say they are. Both PDF and Zip files have signatures: (albeit really basic check, it's a starting point)

http://filext.com/detaillist.php?extdetail=pdf
http://filext.com/detaillist.php?extdetail=zip
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

The bit I'm not sure of is how to access the identifing marks. The two obvious options I would have gone for are:

1. Check the uploaded file extension. In my experience - unreliable. Also, quote from php.net
$_FILES['userfile']['type']
The mime type of the file, if the browser provided this information. An example would be "image/gif". This mime type is however not checked on the PHP side and therefore don't take its value for granted.
2. mime_content_type() - but it's not accessible on my server

Is there any other way I can get the file type? All I want is to check a file type against a list of allowed file types array("application/msword", "application/pdf", "image/png", "image/gif"). Would checking if it's a .zip by opening it be too cumbersome? Not sure of the PDF or MS Word equivalent.

http://uk2.php.net/manual/en/function.zip-open.php


thanks for the help.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

mime_content_type() isn't reliable either as it uses magic.mime which just maps extensions to mime-types.

To analyze the file, fopen() it ($_FILES['userfile']['tmp_name']) and read however many bytes are necessary to determine the type based on the identifying marks given on filext.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

offhand, I remember a previous mentioning of a unix command called "file" which interrogated the file in some fashion, but I dont' remember how well it does that.

http://node1.yo-linux.com/cgi-bin/man2h ... nd=file(1)
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

I was just looking at that on the php.net site, but I got scared off as I saw this post (http://uk.php.net/manual/en/function.file.php#48420) saying the whole file has to be loaded before being able to do anything ( using file() that is) - I thought it might get a bit cumbersome . I'll take a look at the fopen approach.
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

small world ... contributions to 'file' development are from a guy just down the road from me!
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

file() is different than the unix command "file."

file() loads the file by lines into an array, while "file" attempts to return file-type information
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

I got that, bad timing on the posts I think!

I've set up a check using file_get_contents(). Reading the first few characters and checking them against the ASCII identifying characters you sent in the link.

It actually seems pretty sraight forward, though I've only looked at uploading .zip and .pdf.

Code: Select all

$uploaded_file = $_FILES['formfile']['tmp_name'];

$content = file_get_contents($uploaded_file);

// if it's a PDF it will start with %PDF-1
// if it's a ZIP it will start with PK
I don't think the above method will work for plain text files thought. Also tried it with a flash file - but they only have a hex identifier for it. This would mean converting it with bin2hex($content); - then checking it against the identifiers. I guess you'd have to have a library of identifiers - and check the file in ASCII and HEX to find the file type. That'll be some function!
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Just need either hex or binary, not both.

For identifying potential text files and some other files in more detail than just a signature:
Useful Posts wrote:Some helpful information determining a file's actual type: Upload Script
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

Thanks for point out the post with a plain text detection script - I have yet to try it though.

I have started on a script to check file types.

Code: Select all

$hex_file_idents = array(
"zip"=>"50 4B 03 04",
"pdf"=>"25 50 44 46 2D 31 2E",
"fla"=>"D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00",
"swf"=>"46 57 53",
"mp3"=>"FF FB 30",
"mp3(2nd identifier test)"=>"49 44 33",
"wmv"=>"30 26 B2 75 8E 66 CF 11 A6 D9 00 AA 00 62 CE 6C",
"avi"=>"52 49 46 46",
"mpg1"=>"00 00 01 BA 21 00 01",
"midi"=>"4D 54 68 64",
"eps"=>"C5 D0 D3 C6"
);



$content = file_get_contents($uploaded_file);				// open the + read contents
$hex = bin2hex($content);									// convert it to hex

foreach($hex_file_idents as $file_ext => $file_ident){		// loop through out arrary of identifiers
	
	$ident = strtolower(str_replace(" ", "", $file_ident)); // lower the case, remove spaces from our array entry
	$ident_length = strlen($ident); 						// now get the length of our array entry
	
	$extract_file_ident = strtolower(substr($hex, 0, $ident_length)); 	// extract the same length from out open file + lower case

	if($extract_file_ident == $ident){						// if there's a match, output the file type
		echo "Your file is ".$file_ext;
		break;
	}
}
At the moment, I have a few issues.

1. to get a match I am removing spaces and lowering the case. How could I do this efficiently for an entire uploaded file? Having said that, I haven't needed to do that yet as all uploaded files seem to have no spaces and be lower case.

2. I'm no good with regex, but would a preg_match search be more efficient than extracting a substring an comparing it?

3. my method might find a match if one file type identifier contains the hex value of a shorter identifier for another file type i.e.



.fla = D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
.doc = D0 CF 11 E0 A1 B1 1A E1

searching for a .doc returns a .fla
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Try using the following. You'll need to integrate it into your existing code.

Code: Select all

function condense($value)
{
  return pack('H*',str_replace(' ','',$value));
}

function get_signature($file,$hex_idents)
{
	$fp = fopen($file,'rb');
	if(!$fp)
	{
		return null;
	}
	
	$bin_idents = array_map('condense', $hex_idents);
	$size = array_map('strlen',$bin_file_idents);
	$read = max($size);

	$data = fread($fp,$read);
	fclose($file);
	
	foreach($bin_idents as $type => $signature)
	{
		$found = (substr($data, 0, strlen($signature) === $signature);
		if($found)
		{
			break;
		}
	}
	
	return ($found ? $type : false);
}
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

Thanks.

Code: Select all

// location of the uploaded file
$uploaded_file = $_FILES['formfile']['tmp_name'];
// hex values for file types
$hex_file_idents = array( ".pdf"=>"25 50 44 46 2D 31 2E", ".doc"=>"D0 CF 11 E0 A1 B1 1A E1", ".zip"=>"50 4B 03 04");



	function condense($value){
		return pack('H*',str_replace(' ','',$value));
	}

	function get_signature($file,$hex_idents){
		$fp=fopen($file, 'rb');
		if(!$fp){
			return null;
		}

		$bin_idents = array_map('condense', $hex_idents);
		$size = array_map('strlen', $bin_idents);
		$read = max($size);
	
		$data = fread($fp, $read);
		fclose($fp);

		foreach($bin_idents as $type => $signature){

			$found = (substr($data, 0, strlen($signature) === $signature));

			echo '$data = '.$data.' & $signature = '.$signature."<br />"; // for testing
			if($found){
				break;
			}
		}
		
		return($found ? $type : false);
	}

// check the file types
$file_extension = get_signature($uploaded_file, $hex_file_idents);

/* RETURNS
$data = %PDF-1.3 & $signature = %PDF-1.  // this should be a match
$data = %PDF-1.3 & $signature = ÐÏࡱá
$data = %PDF-1.3 & $signature = PK
*/
Looks like the $read value is too long. I haven't seen some of the functions you've used before so I'll need a bit more time to get it working.

I have a 'good enough' file type detection working for now, I'll update this post once I have a decent enough function for doing this though. Thanks for your time.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

oops, I missed a paren in the right place...

Code: Select all

function condense($value)
{
  return pack('H*',str_replace(' ','',$value));
}

function get_signature($file,$hex_idents)
{
    $fp = fopen($file,'rb');
    if(!$fp)
    {
        return null;
    }
    
    $bin_idents = array_map('condense', $hex_idents);
    $size = array_map('strlen',$bin_file_idents);
    $read = max($size);

    $data = fread($fp,$read);
    fclose($file);
    
    foreach($bin_idents as $type => $signature)
    {
        $found = (substr($data, 0, strlen($signature)) === $signature);
        if($found)
        {
            break;
        }
    }
    
    return ($found ? $type : false);
}
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

works! (plus a couple of other tweaks to var names).

I will update once I have tested it out with various file formats.
thanks.
Post Reply