Remove Byte Order Marks from UTF-8 Encoded Files

Small, short code snippets that other people may find useful. Do you have a good regex that you would like to share? Share it! Even better, the code can be commented on, and improved.

Moderator: General Moderators

Post Reply
User avatar
Pyrite
Forum Regular
Posts: 769
Joined: Tue Sep 23, 2003 11:07 pm
Location: The Republic of Texas
Contact:

Remove Byte Order Marks from UTF-8 Encoded Files

Post by Pyrite »

Y0, time for another phat "why the hell would I ever need that" release by yours truly!

This is a simple, and I do mean simple, php script to remove UTF-8 BOMs from files in the cwd.

Since most web browsers, text editors don't show the BOM, I recommend you copy and paste this useless code into a terminal type editor like ... "pico" (you know vi sucks).

Code: Select all

#!/usr/bin/php -q
<?php
/*
   Description: Script to Remove Byte Order Marks (BOM) from UTF-8 Files in Current Working Directory
   Q. What Is A BOM?
   A. http://en.wikipedia.org/wiki/Byte_Order_Mark
   Usage: 1. Copy this script into a directory with files to check for BOM's.
          2. Make it executable (chmod +x defusebom.php)
          3. Execute (./defusebom.php)
*/

$cwd        = dirname(".");
$dh         = opendir($cwd);
$haystack   = "";    // Initialized
$needle     = ""; // DO NOT EDIT
$reportonly = FALSE; // Set to TRUE to Only Report Which Files Have a BOM

while (false !== ($file = readdir($dh))) {
    if (!is_dir("$cwd/$file") && $file != basename($_SERVER["PHP_SELF"])) {
        $haystack = file_get_contents("$cwd/$file");
        if (substr_count($haystack, $needle)) {
            if (!$reportonly) {
                // Uh oh, a BOM was found, defuse it!
                $newfile = str_replace($needle, "", $haystack);
                if (is_writable("$cwd/$file")) {
                    // Open File For Writing
                    $handle = fopen("$cwd/$file", "w");
                    fwrite($handle, $newfile);
                    fclose($handle);
                    echo "Defused BOM In: ".$file."\n";
                } else {
                    echo "File: ".$file." had a BOM that couldn't be defused.\n";
                }
            } else {
                echo "File: ".$file." Has a BOM!\n";
            }
        }
    }
}
?>
http://www.pysquared.com/files/defusebom.phps
Post Reply