Handling file types - safely

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Handling file types - safely

Post by ed209 »

Is it just me, or does file handling take quite a lot of effort? What I'm after is some sort of functionality that will tell me what file has just been uploaded. I have put up a couple of posts about this and had some great responses. With those responses and some other info I've searched for, I have cobbled together a class that determines what file type has just been uploaded.

I haven't tested it much, but it seems to work ok.

Code: Select all

/*
THIS CLASS IS DESIGNED TO CHECK THE FILE TYPE OF AN UPLOADED FILE
WITHOUT DEPENDING ON THE INFORMATION SENT BY THE BROWSER.

THIS CLASS HAS NOT BEEN TESTED VERY MUCH - SO DON'T USE IT FOR REAL... YET
*/


class file_type{

	# ADD MORE OF THESE FROM http://filext.com/ #
	var $idents = array(
	"application/pdf"=>"25 50 44 46 2D 31 2E", 
	"application/msword"=>"D0 CF 11 E0 A1 B1 1A E1", 
	"application/zip"=>"50 4B 03 04"
	);

#####################################################
# DETERMINES THE FILE TYPE           
#####################################################
function get_file_type ($file){
	# IS THE FILE AN IMAGE FILE #
	if($file_info = getimagesize($file)){
	
		return $file_info['mime'];
	
	# IS THE FILE TYPE LISTED IN OUR CATALOGUED TYPES $this->idents #
	}else if($file_type = $this->get_ident($file,$this->idents)){
	
		return $file_type;
	
	# IS THE UPLOADED FILE A TEXT FILE #
	}else if($this->is_text_file($file)){
	
		return "text/plain";
	
	# NOT A RECOGNISED FILE TYPE BY THIS CLASS #
	}else{
	
		return false;
	
	}

}


#####################################################
# GET OTHER FILE TYPES FROM $this->idents            
#####################################################
# Check post http://forums.devnetwork.net/viewtopic. ... 028#246028 #

function get_ident($file,$hex_idents){

	# OPEN THE FILE FOR READING (BINARY) #
	$fp=fopen($file, 'rb');
	if(!$fp){
		return null;
	}
	
	# GET THE (converted to bin) HEX IDENTIFIER LENGTH TO EXTRACT THAT AMOUNT OF BYTES FROM OUR UPLOADED FILE #
	$bin_idents = array_map(array($this, 'condense'), $hex_idents);
	$size = array_map('strlen', $bin_idents);
	$read = max($size);

	# STORE THE READ DATA #
	$data = fread($fp, $read);
	fclose($fp);

	# CHECK OUR DATA AGAINST THE ARRAY OF CATALOGUED FILE TYPES $this->idents #
	foreach($bin_idents as $type => $signature){

		$found = (substr($data, 0, strlen($signature)) === $signature);
		if($found){
			break;
		}
	}
	
	return($found ? $type : false);
}

function condense($value){
	return pack('H*',str_replace(' ','',$value));
}


#####################################################
# CHECKS WHETHER THE FILE IS A PLAIN TEXT FILE       
#####################################################
# Check http://forums.devnetwork.net/viewtopic.php?t=23517 #

function is_text_file($filename){
	if(!is_readable($filename)) return false;
	$data = file_get_contents($filename);
	$bad = false;
	for( $x = 0 , $y = strlen($data); !$bad && $x < $y; $x++){
		$bad = ( ord($data{$x}) > 127 );
	}
	return!$bad;
}


#####################################################
# CREATES NEW FILE NAME WITH ONLY ALPHA NUMERIC CHARS
#####################################################
function websafe_rename($string, $mime_type){

	// remove the current extenion
	$chop_to = strlen(strrchr($string, "."));
	$string = substr($string,0,(strlen($string)-$chop_to));
	
	// remove non-alpha characters
	$new_string=ereg_replace("[^A-Za-z0-9]","_",$string); 
	$final_string=ereg_replace("[_]+","_",$new_string);
	
	// get the correct extension type for the file
	$extension=$this->get_mime_extension($mime_type);
	
	return $final_string.$extension;
}

#####################################################
# GETS THE CORRECT EXTENSION FOR FILE TYPES          
#####################################################
function get_mime_extension($mime_type){ 
	$mime = array('application/msword'=>'.doc','image/gif'=>'.gif','image/jpeg'=>'.jpg','application/pdf'=>'.pdf','image/png'=>'.png','application/zip'=>'.zip','text/plain'=>'.txt'); 
	return $mime[$mime_type];
}


}//end class

And here's some example usage

Code: Select all

$uploaded_file_type = new file_type;

# GET THE MIME TYPE #
$mime_type = $uploaded_file_type->get_file_type($_FILES['formfile']['tmp_name']);

if(!$mime_type){
	echo "Sorry, not really sure what you've uploaded.";
}else{
	# CREATE A NEAT NEW NAME + EXTENSION #
	$new_file_name = $uploaded_file_type->websafe_rename($_FILES['formfile']['name'], $mime_type);
	echo "Your file ( ".$new_file_name." ) is ".$mime_type."<br />";
}
You can also add file types to check:

Code: Select all

$uploaded_file_type->idents["application/zip"] = "50 4B 03 04";

Are there any scripts that do this better out there? There are a few floors with this, but it works at the moment - suggestions / ideas welcome
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

You're right it is a nightmare. The best way to reliably and quickly handle image mime checking is to use getimagesize() unless you did something like you've done.

Checking the header bytes in a file is certainly the way to go with other types, and that's exactly how I've done it in the past ;) I think I may have even posted something in snippets.

You could store all your mime headers in a database or flat file too so you don't need to keep adding to your arrays ;)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

If you're looking for a place for the indentifying bytes in various file types you might want to look at:

http://filext.com/
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

The structure behind the class is in order of what I'd expect people to upload most often. i.e.

1. Check if it's an image with getimagesize()
2. Check the uploaded file against my allowed list - which, as you say, could be referenced from a db
3. Check if it's a text file - as I don't think you can apply the previous function to that.

Some of the problems are with the second item. Checking the header is more reliable for some files than others i.e.

.fla = D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
.doc = D0 CF 11 E0 A1 B1 1A E1

Also, as in .mov, how do you know how it has been encoded?

Hex (position 4): 6D could lead to any of the following mime types

video/quicktime
video/x-quicktime
image/mov
audio/aiff
audio/x-midi
audio/x-wav
video/avi


The class works for the most obvious file types - with video being a secondary requirement.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Although it's blatantly going to be slower the only way I can see past that issue is to find the longest set of bytes you have on record... start with that number of bytes, check for a match, drop a byte, check for a match, drop another byts, check for match.... etc etc etc (recursively).

I dread to think how slow it would be if you had thousands of types in a database though ;)
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

The only time I can see this technique being of any use would be in the case where you wanted to ensure the image you are about to display is indeed a valid GIF or JPEG and not some PDF that got renamed either by accident or maliciously...

As for security, why isn't a simple extension check at upload good enough?

In anycase, not sure if you know about these guys yet, but if not...

Heres an excellent source for file structures: http://www.wotsit.org

Cheers :)
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

you can't rely on an extension check to determine file type. For a start, what happens if there is no extension? Even worse, someone could upload app.exe.jpg.

The only way to be sure is to check the file yourself (not literally, with PHP) once it's on the server - and not rely on anything sent by the user/browser.

Thanks for the link..
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

ed209 wrote:you can't rely on an extension check to determine file type. For a start, what happens if there is no extension? Even worse, someone could upload app.exe.jpg.

The only way to be sure is to check the file yourself (not literally, with PHP) once it's on the server - and not rely on anything sent by the user/browser.

Thanks for the link..
Sure you can. Why not?

What happens if there is no file extension? Simple the upload script says, wait a minute! No extension = banned extension.

You use a principle of least privilege approach, whereby if the file your uploading doesn't have the extension: GIF, BMP, JPEG, JPG, TIFF, PCX

Sorry pal, your file doesn't get uploaded...

As for worse case scenario, someone uploads an EXE disguised as a JPG...

So long as you can't rename that file (change extension) the image won't be rendered, it'll be considered a corrupted file. To the best of my knowledge, no code will be executed either.

Cheers :)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Hockey wrote:
ed209 wrote:you can't rely on an extension check to determine file type. For a start, what happens if there is no extension? Even worse, someone could upload app.exe.jpg.

The only way to be sure is to check the file yourself (not literally, with PHP) once it's on the server - and not rely on anything sent by the user/browser.

Thanks for the link..
Sure you can. Why not?

What happens if there is no file extension? Simple the upload script says, wait a minute! No extension = banned extension.

You use a principle of least privilege approach, whereby if the file your uploading doesn't have the extension: GIF, BMP, JPEG, JPG, TIFF, PCX

Sorry pal, your file doesn't get uploaded...

As for worse case scenario, someone uploads an EXE disguised as a JPG...

So long as you can't rename that file (change extension) the image won't be rendered, it'll be considered a corrupted file. To the best of my knowledge, no code will be executed either.

Cheers :)
That's incredibly complacent if you ask me. Also, in case you don't know, Linux *doesn't care* about file extensions and thus, plenty of people omit them. Why discriminate against your end users?

Also... if you did allow an upload of a MS Word document names foo.exe.doc there are two big risks. Firstly you may allow an attacker to execute arbitrary code on your server === BAD. Ssecondly, you could be responsible for ditributing (unknowingly) viruses across other people's computers. If you decide to rely on file extensions more fool you IMO.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

d11wtq wrote:
Hockey wrote:
ed209 wrote:you can't rely on an extension check to determine file type. For a start, what happens if there is no extension? Even worse, someone could upload app.exe.jpg.

The only way to be sure is to check the file yourself (not literally, with PHP) once it's on the server - and not rely on anything sent by the user/browser.

Thanks for the link..
Sure you can. Why not?

What happens if there is no file extension? Simple the upload script says, wait a minute! No extension = banned extension.

You use a principle of least privilege approach, whereby if the file your uploading doesn't have the extension: GIF, BMP, JPEG, JPG, TIFF, PCX

Sorry pal, your file doesn't get uploaded...

As for worse case scenario, someone uploads an EXE disguised as a JPG...

So long as you can't rename that file (change extension) the image won't be rendered, it'll be considered a corrupted file. To the best of my knowledge, no code will be executed either.

Cheers :)
That's incredibly complacent if you ask me. Also, in case you don't know, Linux *doesn't care* about file extensions and thus, plenty of people omit them. Why discriminate against your end users?

Also... if you did allow an upload of a MS Word document names foo.exe.doc there are two big risks. Firstly you may allow an attacker to execute arbitrary code on your server === BAD. Ssecondly, you could be responsible for ditributing (unknowingly) viruses across other people's computers. If you decide to rely on file extensions more fool you IMO.
Really? :oops:

So linux actually reads the byte codes in a file to determine which application to run?

Even still, however, you can't start an executable on Linux via an HTTP request can you?

I fail to see how if someone uploads a document like foo.exe.doc onto a web server, how they would be able to execute that application? Unless they had shell access or a PHP script which executed shell commands was carelessy developed.

How else would someone ever execute an exe through a web server?

Besides, IMHO if they could, via shell or whatever, isn't that a problem for system admins instead of developers?

I think checking files using a byte code is going to far. What happens if your working with an web based file manager...it's impossible to keep track of every files byte codes.

My biggest concern with using extensions only is: If someone uploads a PHP script named myimage.gif and later renames it to myscript.php, which is easily executed and could possibly carry out some savage tasks...

But in this case, you could:
1) Limit file uploads to anything but PHP, INC, etc...
2) Prevent renaming of files to different extensions

This is a bit of an important issue, especially for me, cause I designed a pretty powerful file manager and I'm pretty sure it's secure yet flexible.

We should keep this topic rolling to see if perhaps I am missing something and possible jeopradize my app :(

Thanks for the feedback :)

Cheers
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

I fail to see how if someone uploads a document like foo.exe.doc onto a web server, how they would be able to execute that application? Unless they had shell access or a PHP script which executed shell commands was carelessy developed.
What about maliciousFileDeleter.php.jpg, which deletes any file php user has permission to?

What if, you distribute the this file manager, a user installs it on his own server and a user has an htaccess file to treat jpg's as php files, like many of us to for dynamic sigs (or atleast used to) ?
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Jcart wrote:
I fail to see how if someone uploads a document like foo.exe.doc onto a web server, how they would be able to execute that application? Unless they had shell access or a PHP script which executed shell commands was carelessy developed.
What about maliciousFileDeleter.php.jpg, which deletes any file php user has permission to?

What if, you distribute the this file manager, a user installs it on his own server and a user has an htaccess file to treat jpg's as php files, like many of us to for dynamic sigs (or atleast used to) ?
I wasn't aware .htaccess was capable of such a thing. 8O It's cool...although arguably not very useful IMHO...

Have any links on the subject? Or care to give me details on how this is done so I can add it to my security notes :)

Again, obviously this is a problem, but IMHO outside the domain of the developer. A system admin should be aware of a file manager script which uploads arbitrary file types and should not use .htaccess for that purpose - or use a work around like only execute GIF or JPG as PHP scripts in a set directory not all directories.

Thanks for the details :)
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

Security is the main issue here, I want to concentrate on a particular element of security. I want to know for sure what type of file has been uploaded. This does seem possible to an extent (with the most popular file types at least) by checking for images with getimagesize(), other file types by getting the first few bytes of the file and using feyd's funcion to check for text files.

I would like to extend this functionality to movie file type mov, wmv etc but it's not that easy. http://www.youtube.com/ do it really well, and convert files to flash video files (this is my ultimate destination).

So bringing it back to the class I originally posted - is the direction I'm going in reasonable, ott, not strict enough?
Post Reply