Page 1 of 2

Restrict User Choice of Encodings

Posted: Mon Aug 28, 2006 10:52 am
by Ollie Saunders
I'm posting this code because I think its perfect. Want to proove me wrong? I'm sure you do. Can you find fault?

What it is designed to do is aid the user of OsisForms to specify a character encoding to use for their applications from a list of approved encodings and then allow me to write code based on different encodings by testing against the object. PHP's doesn't really have any good facilities for restricting input to a series of approved values so there is quite a lot of code but is very straight forward.

How it is used

Code: Select all

class OF
{
    << snip >>
    /**
     * Initialize OF without creating an instance
     *
     * @param OF_Encoding $encoding
     */
    static public function init(OF_Encoding $encoding)
    {
        self::$_encoding = $encoding
        if ($encoding->isUtf8()) {
            require_once 'utf8/utf8.php';
        }
    
        self::$registry = new OF_Registry();
        self::$_idRegistry = new OF_Registry(null);
    }
    << snip >>
}

OF::init(OF_Encoding::getInstance('UTF-8'));
Unit Test Case

Code: Select all

<?php
class TestOfEncoding extends OF_UnitTest
{
    private $_instance;

    public function testConstruction()
    {
        // smoketest this because when it fails (which it is supposed to) its
        // fatal 
        //$a = new OF_Encoding();
        //$this->assertError(true);
    }
    public function testGetInstance()
    {
        try {
            $this->_instance = OF_Encoding::getInstance('no exist');
        } catch (OF_Exception $e) {}
        $this->assertNotNull($e);
        $this->_instance = OF_Encoding::getInstance('UTF-8');
    }
    public function testIsUtf8()
    {
        $this->assertTrue($this->_instance->isUtf8());
        // deliberately inexact case:
        $this->_instance = OF_Encoding::getInstance('Iso-8859-1');
        $this->assertFalse($this->_instance->isUtf8());
    }
    public function testToStringAndGet()
    {
        $this->assertEqual($this->_instance->get(), 'ISO-8859-1');
        $this->assertEqual($this->_instance->__tostring(), 'OF_Encoding(ISO-8859-1)');
    }
}
Class

Code: Select all

<?php
/**
 * @package OF
 */
/**
 * OF_Encoding
 *
 * --- Class Details -----------------
 *
 * Ensures only a supported encoding is used.
 * This will be replaced by declare(encoding = 'x'); in PHP 6
 *
 * --- Maintenance Details -----------
 *
 * If you need support for more encodeding add to the static $_supported
 * variable these must be all uppercase. If you need to add any more is* tests
 * remember you can cache them on construction if you think it will aid
 * performance.
 *
 * ------------------------------------
 *
 * @package     OF
 * @author      Oliver Saunders
 * @copyright	Copyright (c) 2006 OS Internet Services (http://www.osinternetservices.com)
 * @license		See LICENSE.txt	New BSD License
 */
class OF_Encoding
{
    static protected $_supported = array('UTF-8', 'ISO-8859-1', 'CP1252');
    /**
     * Key of self::$_support, the encoding to use
     *
     * @var int
     */
    protected $_encoding;
    /**
     * @var bool;
     */
    protected $_isUtf8;
    /**
     * Privatised to ensure construction is only done internally with getInstance()
     */
    private function __construct() {}
    /**
     * Creates a controller instance of Encoding
     *
     * @param string $encoding
     * @return Encoding
     */
    static public function getInstance($encoding)
    {
        $encoding = strtoupper($encoding); // ensures case insensitivity
        $supportedKey = array_search($encoding, self::$_supported);

        if ($supportedKey === false) {
            throw new OF_Exception('Unsupported encoding specified (' . $encoding . ')');
        }

        $instance = new self();
        $instance->_encoding = $supportedKey;
        // a very common test, result is cached here at construction
        $instance->_isUtf8 = $encoding == 'UTF-8';
        return $instance;
    }
    /**
     * Access to $_isUtf8
     *
     * @return bool
     */
    public function isUtf8()
    {
        return $this->_isUtf8;
    }
    /**
     * Return encoding
     *
     * @return string
     */
    public function get()
    {
        return self::$_supported[$this->_encoding];
    }
    /**
     * @return string
     */
    public function __tostring()
    {
        return __CLASS__ . '(' . $this->get() . ')';
    }
}

Posted: Mon Aug 28, 2006 12:06 pm
by Christopher
I am obviously missing a big something because that looks like an immense about of code for what should be just a setEncoding() method. And I never understand the attraction of statics, such as that "without creating an instance" which creates two objects internally. What's wrong with __construct()? I prefer simpler.

Posted: Mon Aug 28, 2006 12:16 pm
by Ollie Saunders
Is there was something wrong with this code it would be that. But I was thinking about performance you see. I didn't want to have to do string comparisons where I would cache booleans.

Posted: Mon Aug 28, 2006 1:05 pm
by Christopher
ole wrote:But I was thinking about performance you see. I didn't want to have to do string comparisons where I would cache booleans.
The problem is that the string comparison is probably done only once a request anyway, so you have perhaps harmed performace by adding an assignment and more memory usage due to extra code.

Posted: Mon Aug 28, 2006 2:30 pm
by Ollie Saunders
Uh.

Err...well. Crap.

Really are you sure that is the case?

Posted: Mon Aug 28, 2006 2:44 pm
by Benjamin
It's test time!

Posted: Mon Aug 28, 2006 2:49 pm
by Ollie Saunders
I'm failing to do one at the moment.

Posted: Mon Aug 28, 2006 2:52 pm
by Ollie Saunders
OK somebody check this is right. I'm having a brain fuzz at the moment so the logic could be wrong. That's what you get for using silly arrays.

Code: Select all

class Uncached
{
    protected $_encoding;
    public function __construct($enc)
    {
        $this->_encoding = $enc;
    }
    public function get()
    {
        return $this->_encoding;
    }
}
define('TIMES',300);
$time = array();

$time[0]['start'] = microtime();
for ($i=0;$i<TIMES;$i++) {
    $a = new Uncached('UTF-8');
    for ($k=0;$k<TIMES;$k++) {
        if ($a->get() == 'UTF-8');
    }
}
$time[0]['end'] = microtime();


class Cached extends Uncached
{
    protected $_isUtf8;
    public function __construct($enc)
    {
        $this->_encoding = $enc;
        $this->_isUtf8 = $enc == 'UTF-8';
    }
    public function isUtf8()
    {
        return $this->_isUtf8;
    }
}

$time[1]['start'] = microtime();
for ($i=0;$i<TIMES;$i++) {
    $a = new Cached('UTF-8');
    for ($k=0;$k<TIMES;$k++) {
        if ($a->isUtf8());
    }
}
$time[1]['end'] = microtime();

foreach ($time as $k => $v) {
    $time[$k] = elapsed_time_in_milliseconds($v['start'], $v['end']);
}
function elapsed_time_in_milliseconds($start, $end){
    $start = explode(' ', $start);
    $start = $start[0] + $start[1];
    $end = explode(' ', $end);
    $end = $end[0] + $end[1];
    return $end - $start;
}

print_r($time);
if($time[0] < $time[1]) {
    echo 'Uncached is '.round($time[0] / $time[1], 1).' times faster than cached';
} else {
    echo 'Cached is '.round($time[1] / $time[0], 1).' times faster than uncached';
}

Code: Select all

Array ( [0] => 0.160835981369 [1] => 0.124831914902 ) Cached is 0.8 times faster than uncached

Posted: Mon Aug 28, 2006 3:24 pm
by feyd
This thread needs a better (more descriptive) name.

Posted: Mon Aug 28, 2006 3:36 pm
by Ollie Saunders
And now its got one.

Posted: Mon Aug 28, 2006 5:08 pm
by Ollie Saunders
Used a > instead of an <. Which means the actual output is:

Code: Select all

Array ( [0] => 0.162651062012 [1] => 0.126176834106 ) Uncached is 1.29 times faster than cached
So I've changed it now:

Code: Select all

class OF_Encoding
{
    static protected $_supported = array('UTF-8', 'ISO-8859-1', 'CP1252');
    /**
     * The encoding to use
     *
     * @var string
     */
    protected $_encoding;
    /**
     * Privatised to ensure construction is only done internally with getInstance()
     */
    private function __construct() {}
    /**
     * Creates a controller instance of Encoding
     *
     * @param string $encoding
     * @return Encoding
     */
    static public function getInstance($encoding)
    {
        $encoding = strtoupper($encoding); // ensures case insensitivity
        if (!in_array($encoding, self::$_supported)) {
            $errStr = 'Unsupported encoding specified (' . $encoding . ')';
            throw new OF_Exception($errStr, OF_Exception::TYPE_SCALAR);
        }
        $instance = new self();
        $instance->_encoding = $encoding;
        return $instance;
    }
    /**
     * @return bool
     */
    public function isUtf8()
    {
        return $this->_encoding == 'UTF-8';
    }
    /**
     * @return bool
     */
    public function isIso88591()
    {
        return $this->_encoding == 'ISO-8859-1';
    }
    /**
     * @return bool
     */
    public function isCp1252()
    {
        return $this->_encoding == 'CP1252';
    }
    /**
     * Return encoding
     *
     * @return string
     */
    public function get()
    {
        return self::$_supported[$this->_encoding];
    }
    /**
     * @return string
     */
    public function __tostring()
    {
        return __CLASS__ . '(' . $this->get() . ')';
    }
}
Perfect now?

Posted: Mon Aug 28, 2006 5:35 pm
by Christopher
So what is a use case?

Posted: Mon Aug 28, 2006 5:55 pm
by Ollie Saunders
I wrote:

Code: Select all

class OF
{
    << snip >>
    /**
     * Initialize OF without creating an instance
     *
     * @param OF_Encoding $encoding
     */
    static public function init(OF_Encoding $encoding)
    {
        self::$_encoding = $encoding
        if ($encoding->isUtf8()) {
            require_once 'utf8/utf8.php';
        }
   
        self::$registry = new OF_Registry();
        self::$_idRegistry = new OF_Registry(null);
    }
    << snip >>
}

OF::init(OF_Encoding::getInstance('UTF-8'));
And then dotted around the entire library will be things like this:

Code: Select all

if (self::$encoding->isUtf8()) {
   $out.= self::_makeAttrib('class', utf8_bad_replace($this->wrapClass, ''));
} else {
   $out.= self::_makeAttrib('class', $this->wrapClass);
}

Posted: Mon Aug 28, 2006 6:36 pm
by Christopher
I guess I am wondering why OF_Encoding is a separate class. It is all of about five lines of code. The information does not really need to be public and the OF class still has to include the utf8.php file so the responsiblities are spread around anyway. Plus you mix string identifiers like 'UTF-8' with named getters. The strings are standard names anyway, just use the strings or maybe define class constants. I would pare it down to me minimum necessary.

Posted: Tue Aug 29, 2006 6:09 am
by Ollie Saunders
Hehehe. I have a response for every clause. :)
I guess I am wondering why OF_Encoding is a separate class.
Modularization. I don't want to ask OF what encoding it is using when I can ask a dedicated object.

Code: Select all

OF::$encoding->isUtf8(); // is more readable than...
OF::isUtf8();
Its very easy for me to junk loads of stuff into OF. Its already got quite a lot of random extras. Besides a dedicated object scales a lot better, can be inherited, I can have more than one instance if I desire etc.
It is all of about five lines of code.
All the best classes are small. Why is that a problem?
The information does not really need to be public
What Utf8()? No I need that to be public besides what would I be hiding if I made it protected or private the object was created with a string that describes the encoding so that information is already public knowledge.
the OF class still has to include the utf8.php file so the responsiblities are spread around anyway
That's a library for handling UTF-8, its very large and performance oriented. I tried to lump it into a big class but it just got massive and the parsing overhead got a little silly so instead (and this is what you are supposed to do) you include the things you need as you need them.
Plus you mix string identifiers like 'UTF-8' with named getters.
So?
just use the strings
Do this you mean?

Code: Select all

if (OF::$encoding == 'UTF-8')
and risk the chance of me typing a single character wrong in that second string when I can do this

Code: Select all

if (OF::$encoding->isUtf8())
where PHP will shout at me if I get the function wrong and I don't have to think about the formatting of the encoding name, which I, and many other people, are not particularly familiar with.
or maybe define class constants
I thought at first that constants would provide the solution. They don't. They are one-way only. If I have 'UTF-8' stored in a constant there is no way to work out the name of the constant that give me that string without searching through the output of get_defined_constants().
I would pare it down to me minimum necessary.
I would write it in a way that gives me most control, flexibility and extendibility. These are things that are much more important than simplicity alone. That's not to say my solution isn't simple because it is. Like you said its only about 5 lines of code.