Identifying text vs. binary files

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
benwei
Forum Newbie
Posts: 1
Joined: Wed Jun 08, 2005 2:36 pm

Identifying text vs. binary files

Post by benwei »

Hello there,

I am trying to find a simple way of differentiating plaintext from binary files without resorting to either the Fileinfo PECL module or (now depreciated) mime_content_type() function. Doe anybody know of one?

Thanks,
Ben
ryuuka
Forum Contributor
Posts: 128
Joined: Tue Sep 05, 2006 8:18 am
Location: the netherlands

Post by ryuuka »

use the

Code: Select all

explode()
function to get the extension of the file

but this is a very insecure way to do this
shwanky
Forum Commoner
Posts: 45
Joined: Thu Feb 15, 2007 1:21 am

Re: Identifying text vs. binary files

Post by shwanky »

benwei wrote:Hello there,

I am trying to find a simple way of differentiating plaintext from binary files without resorting to either the Fileinfo PECL module or (now depreciated) mime_content_type() function. Doe anybody know of one?

Thanks,
Ben
If exec isn't disabled on your server you can do exec("file -b $filename" $result); ^.^
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Just look for NULL bytes.

Code: Select all

function is_binary_file($path)
{
    if (is_file($path) && is_readable($path))
    {
        $handle = fopen($path, "rb");
        while (false !== $byte = fread($handle, 1))
            if ($byte == "\0") return true;
    }
    //Not binary, not NULLs detected
    return false;
}
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I'm not sure all binary files will have NULL bytes. While the probability they do is likely high, it's not guaranteed.

"Standard" text files use a maximum of 7 bits. Scanning the file for bytes out of that range may yield better positives. However that depends on what you consider text files.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Unicode is still text. I would certainly say a text file contains no NULL bytes, but I can't for sure say that a binary file WILL contain NULL bytes. It's unlikely that you won't find a NULL byte in a binary file however.

I saw another app (haven't got the foggiest where now!) which took a substr() of the first 1000 bytes then checked that for strpos() of \0. I'm not sure if they had a reason for only taking the first 1000 bytes other than memory/speed issues.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

d11wtq wrote:Unicode is still text.
That's why I quote standard and added that what the user may consider as text may or may not fall into it.


Since all files are binary, I would actually suggest analyzing the file data to determine if it is what the user considers as text.
the DtTvB
Forum Newbie
Posts: 11
Joined: Sun Feb 11, 2007 6:10 am

Post by the DtTvB »

I would check for any characters that have the character code between 0 - 31.
Post Reply