Restrict User Choice of Encodings

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

I think it boils down to preference.
sike
Forum Commoner
Posts: 84
Joined: Wed Aug 02, 2006 8:33 am

Post by sike »

ole wrote: And then dotted around the entire library will be things like this:

Code: Select all

if (self::$encoding->isUtf8()) {
   $out.= self::_makeAttrib('class', utf8_bad_replace($this->wrapClass, ''));
} else {
   $out.= self::_makeAttrib('class', $this->wrapClass);
}
can't you move the cleaning and conversions into the encoding class?

something like

Code: Select all

$out.= self::_makeAttrib('class', self::$encoding->cleanup($this->wrapClass, ''));
in my eyes this would move responsibilities where they should be - into the encoding class.

cheers
Chris
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Big it up for sike!!!!
Woooo Wooo WOOO!!
YEAH!

I mean, thanks; that's a brilliant idea. :D
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Warning long post
Preamble

OK from the seed that sike planted a great tree grew and things have developed the idea a bit.....well a lot. So here's some new code to criticise.

@arborint: thanks for your crit so far, even if I do disagree with you its great that you challenge me because when I don't have an answer for something I might have a revelation. So you are most welcome to do your worst once again!

The scope of the code has developed a bit. This is designed to allow the user to choose an encoding (specified with just a string) and then in rest of the library methods can be used to do encoding specific functionality which is different depending on the encoding the user specified. Needless to say this is an example of a strategy pattern.

I finally managed to remember to write the tests before the code with this so this is officially the first totally sucessful product of TDD by me. :D

Anyway here are those sections again:

How you use it

Code: Select all

OF::init('UTF-8');

class OF
{
    <<snip>>
    static public function init($enc)
    {
        static $called = false;
        if ($called) {
            $errStr = 'You may not reinitialize ' . __CLASS__;
            throw new OF_Exception($errStr, OF_Exception::MISC);
        }

        self::$enc = new OF_Enc($enc);
        self::$registry = new OF_Registry();
        self::$_idRegistry = new OF_Registry(null);
        $called = true;
    }
    <<snip>>
}
The tests for it

Code: Select all

<?php
class TestOfEnc extends OF_UnitTest
{
    public function testInstanitation()
    {
        $iso88591 = new OF_EncStrategySingle('ISO-8859-1');
        $cp1252 = new OF_EncStrategySingle('cp1252');
        $utf8 = new OF_EncStrategyMultiUtf8();

        $defaultEnc = new OF_Enc(null);
    }
    public function testConstruction()
    {
        $enc = new OF_Enc('UTF-8');
        $this->assertIsA(self::expose($enc, '_strategy'), 'OF_EncStrategyMultiUtf8');
        $enc = new OF_Enc('ISO-8859-1');
        $this->assertIsA(self::expose($enc, '_strategy'), 'OF_EncStrategySingle');
        try {
            $enc = new OF_Enc('unsupported encoding');
        } catch (OF_Exception $e) {}
        $this->assertNotNull(@$e);
    }
    public function testStrategy()
    {
        $encoding = new OF_Enc('UTF-8');
        $clean = 'just a string';
        $cleaned = $encoding->inputClean($clean);
        $this->assertEqual($cleaned, $clean);

        // 170 is invalid UTF-8
        $dirty = chr(30) . chr(170);
        $cleaned = $encoding->inputClean($dirty);
        $this->assertEqual($cleaned, $dirty[0]);

        try {
            $warn = $encoding->inputWarn($clean);
            $warn = $encoding->inputWarn($dirty);
        } catch (OF_Exception $e) {};
        $this->assertNotNull(@$e);
        $this->assertEqual($warn, $clean);
        try {
            $this->assertEqual($encoding->someNonExistantThing($clean), $clean);
        } catch (OF_Exception $e) {}
        $this->assertNotNull(@$e);
    }
}
The code itself

Code: Select all

<?php
/**
 * LICENSE
 *
 * This source file is subject to the new BSD license that is bundled with this
 * package in the file LICENSE.txt. If you did not receive a copy of the
 * license, please send an email to oliver@osinternetservices.com so I can send
 * you a copy immediately.
 *
 * @package OF
 */
/**
 * OF_Enc
 *
 * --- Class Details -----------------
 *
 * OF_Enc contains an encoding strategy subclass. See methods.
 *
 * --- Maintenance Details -----------
 *
 * Add new strategies to the switch
 *
 * -----------------------------------
 *
 * @package		OF
 * @author 		Oliver Saunders
 * @copyright	Copyright (c) 2006 OS Internet Services (http://www.osinternetservices.com)
 * @license		See LICENSE.txt	New BSD License
 */
class OF_Enc
{
    /**
     * The strategy implementation
     *
     * @var OF_EncStrategy
     */
    private $_strategy;
    /**
     * Instantiate with a strategy. You may specify a string encoding name which
     * will then use the appropriate strategy, null to grab the default_charset
     * from php.ini and use the appropriate strategy for that or an instance of
     * OF_EncStrategy.
     *
     * @param null|string|OF_EncStrategy $encoding
     * @throws OF_Exception if an unknown strategy is specified
     */
    public function __construct($encoding = null)
    {
        $iniGot = false;
        if ($encoding === null) {
            $encoding = ini_get('default_charset');
            $iniGot = true;
        }
        if (is_string($encoding)) {
            switch ($encoding) {
                case 'cp1252':
                case 'ISO-8859-1':
                    $this->_strategy = new OF_EncStrategySingle($encoding);
                    break;
                case 'UTF-8':
                    $this->_strategy = new OF_EncStrategyMultiUtf8();
                    break;
                default:
                    $errStr = 'Unsupported encoding (' . $encoding . ')';
                    throw new OF_Exception($errStr, OF_Exception::EXISTANCE);
            }
        } else if ($encoding instanceof OF_EncStrategy) {
            $this->_strategy = $encoding;
        } else {
            $errStr = 'Only instances of OF_EncStrategy acceptable';
            throw new OF_Exception($errStr, OF_Exception::INITIALIZATION);
        }
        if (!$iniGot) {
            ini_set('default_charset', $this->_strategy->getName());
        }
    }
    /**
     * Passes on any calls to the strategy.
     *
     * If the method exists you are returned the result if the method does not
     * exist you are returned an unmodified first argument. This is usually a
     * string to process.
     *
     * @param string $funcName
     * @param array $params
     * @return mixed usually a string
     */
    public function __call($funcName, $params)
    {
        if (!method_exists($this->_strategy, $funcName)) {
            // Notify of mistakes
            $errStr = 'Unknown method (' . $funcName . ')';
            throw new OF_Exception($errStr, OF_Exception::EXISTANCE);
        }
        return call_user_func_array(array($this->_strategy, $funcName), $params);
    }
}
/**
 * OF_EncStrategy
 *
 * --- Class Details -----------------
 *
 * OF_EncStrategy is the base encoding strategy class.
 *
 * --- Maintenance Details -----------
 *
 * IMPORTANT: If you add a new method to any strategy you must either:
 *   a. Specify it as abstract here and implement it in all encodings
 *     OR
 *   b. Specify a default implementation here
 *
 * -----------------------------------
 *
 * @package		OF
 * @author 		Oliver Saunders
 * @copyright	Copyright (c) 2006 OS Internet Services (http://www.osinternetservices.com)
 * @license		See LICENSE.txt	New BSD License
 */
abstract class OF_EncStrategy
{
    /**
     * Get the name of the encoding. This is the value that will be used in the
     * charset part of the HTTP response Content-Type header.
     *
     * @return string
     */
    abstract public function getName();
    /**
     * Following methods are default implementations for encoding specific
     * functionality
     */
    public function inputWarn($str)
    {
        return $str;
    }
    public function inputClean($str)
    {
        return $str;
    }
}
/**
 * OF_EncStrategySingle
 *
 * --- Class Details -----------------
 *
 * OF_EncSingle is a strategy of OF_Enc that handles single byte charater
 * encodings. These are supported natively by PHP and thus require no extra
 * functionality.
 *
 * --- Maintenance Details -----------
 *
 * -----------------------------------
 *
 * @package		OF
 * @author 		Oliver Saunders
 * @copyright	Copyright (c) 2006 OS Internet Services (http://www.osinternetservices.com)
 * @license		See LICENSE.txt	New BSD License
 */
class OF_EncStrategySingle extends OF_EncStrategy
{
    /**
     * The name of the encoding to use
     *
     * @var string
     */
    protected $_encoding;
    /**
     * @param string $encoding
     */
    public function __construct($encoding)
    {
        $this->_encoding = $encoding;
    }
    /**
     * @see OF_EncStrategy::getName()
     */
    public function getName()
    {
        return $this->_encoding;
    }
}
/**
 * OF_EncStrategyMultiUtf8
 *
 * --- Class Details -----------------
 *
 * OF_EncSingle is a strategy of OF_Enc that handles single byte charater
 * encodings. These are supported natively by PHP and thus require no extra
 * functionality.
 *
 * --- Maintenance Details -----------
 *
 * -----------------------------------
 *
 * @package		OF
 * @author 		Oliver Saunders
 * @copyright	Copyright (c) 2006 OS Internet Services (http://www.osinternetservices.com)
 * @license		See LICENSE.txt	New BSD License
 */
class OF_EncStrategyMultiUtf8 extends OF_EncStrategy
{
    /**
     * Load up the utf8 library.
     * Add any additional requires here
     */
    public function __construct()
    {
        require_once 'utf8/utf8.php';
        require_once 'utf8/utils/bad.php';
    }
    /**
     * Remove bad characters from $str.
     *
     * Use this on input where bad characters are to be expected i.e. $r_source
     *
     * @param string $str
     * @return string
     */
    public function inputClean($str)
    {
        return utf8_bad_replace($str, '');
    }
    /**
     * Warn if bad characters are present in $str
     *
     * Use this on input that is supposed to be safe and not actually be used
     * as source of input i.e. public properties or parameters on public methods.
     *
     * @param string $str
     * @throws OF_Exception if bad characters are contained in $str
     * @return string
     */
    public function inputWarn($str)
    {
        if (utf8_bad_find($str) !== false) {
            $errStr = 'Encoding error (' . $this->getName() . ')';
            throw new OF_Exception($errStr, OF_Exception::TYPE_SCALAR);
        }
        return $str;
    }
    /**
     * @see OF_EncStrategy::getName()
     */
    public function getName()
    {
        return 'UTF-8';
    }
}
Postamble

One thing I would like to do is have it so that when you specify an unsupported encoding and it default to a single byte mode (OF_EncStrategySingle; where no encoding specific implementations are provided) and warn that is it "assuming this to be a single byte encoding". I can't think of a good way of doing a warning without using trigger_error(). I don't want to use that because then people might use the error supression operator and get no output is an exception occured.

Ideas for a user warning that can be hidden?
Post Reply