Hello there,
I am trying to find a simple way of differentiating plaintext from binary files without resorting to either the Fileinfo PECL module or (now depreciated) mime_content_type() function. Doe anybody know of one?
Thanks,
Ben
Identifying text vs. binary files
Moderator: General Moderators
use the function to get the extension of the file
but this is a very insecure way to do this
Code: Select all
explode()but this is a very insecure way to do this
Re: Identifying text vs. binary files
If exec isn't disabled on your server you can do exec("file -b $filename" $result); ^.^benwei wrote:Hello there,
I am trying to find a simple way of differentiating plaintext from binary files without resorting to either the Fileinfo PECL module or (now depreciated) mime_content_type() function. Doe anybody know of one?
Thanks,
Ben
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Just look for NULL bytes.
Code: Select all
function is_binary_file($path)
{
if (is_file($path) && is_readable($path))
{
$handle = fopen($path, "rb");
while (false !== $byte = fread($handle, 1))
if ($byte == "\0") return true;
}
//Not binary, not NULLs detected
return false;
}- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Unicode is still text. I would certainly say a text file contains no NULL bytes, but I can't for sure say that a binary file WILL contain NULL bytes. It's unlikely that you won't find a NULL byte in a binary file however.
I saw another app (haven't got the foggiest where now!) which took a substr() of the first 1000 bytes then checked that for strpos() of \0. I'm not sure if they had a reason for only taking the first 1000 bytes other than memory/speed issues.
I saw another app (haven't got the foggiest where now!) which took a substr() of the first 1000 bytes then checked that for strpos() of \0. I'm not sure if they had a reason for only taking the first 1000 bytes other than memory/speed issues.