Restrict User Choice of Encodings

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Restrict User Choice of Encodings

Post by Ollie Saunders »

I'm posting this code because I think its perfect. Want to proove me wrong? I'm sure you do. Can you find fault?

What it is designed to do is aid the user of OsisForms to specify a character encoding to use for their applications from a list of approved encodings and then allow me to write code based on different encodings by testing against the object. PHP's doesn't really have any good facilities for restricting input to a series of approved values so there is quite a lot of code but is very straight forward.

How it is used

Code: Select all

class OF
{
    << snip >>
    /**
     * Initialize OF without creating an instance
     *
     * @param OF_Encoding $encoding
     */
    static public function init(OF_Encoding $encoding)
    {
        self::$_encoding = $encoding
        if ($encoding->isUtf8()) {
            require_once 'utf8/utf8.php';
        }
    
        self::$registry = new OF_Registry();
        self::$_idRegistry = new OF_Registry(null);
    }
    << snip >>
}

OF::init(OF_Encoding::getInstance('UTF-8'));
Unit Test Case

Code: Select all

<?php
class TestOfEncoding extends OF_UnitTest
{
    private $_instance;

    public function testConstruction()
    {
        // smoketest this because when it fails (which it is supposed to) its
        // fatal 
        //$a = new OF_Encoding();
        //$this->assertError(true);
    }
    public function testGetInstance()
    {
        try {
            $this->_instance = OF_Encoding::getInstance('no exist');
        } catch (OF_Exception $e) {}
        $this->assertNotNull($e);
        $this->_instance = OF_Encoding::getInstance('UTF-8');
    }
    public function testIsUtf8()
    {
        $this->assertTrue($this->_instance->isUtf8());
        // deliberately inexact case:
        $this->_instance = OF_Encoding::getInstance('Iso-8859-1');
        $this->assertFalse($this->_instance->isUtf8());
    }
    public function testToStringAndGet()
    {
        $this->assertEqual($this->_instance->get(), 'ISO-8859-1');
        $this->assertEqual($this->_instance->__tostring(), 'OF_Encoding(ISO-8859-1)');
    }
}
Class

Code: Select all

<?php
/**
 * @package OF
 */
/**
 * OF_Encoding
 *
 * --- Class Details -----------------
 *
 * Ensures only a supported encoding is used.
 * This will be replaced by declare(encoding = 'x'); in PHP 6
 *
 * --- Maintenance Details -----------
 *
 * If you need support for more encodeding add to the static $_supported
 * variable these must be all uppercase. If you need to add any more is* tests
 * remember you can cache them on construction if you think it will aid
 * performance.
 *
 * ------------------------------------
 *
 * @package     OF
 * @author      Oliver Saunders
 * @copyright	Copyright (c) 2006 OS Internet Services (http://www.osinternetservices.com)
 * @license		See LICENSE.txt	New BSD License
 */
class OF_Encoding
{
    static protected $_supported = array('UTF-8', 'ISO-8859-1', 'CP1252');
    /**
     * Key of self::$_support, the encoding to use
     *
     * @var int
     */
    protected $_encoding;
    /**
     * @var bool;
     */
    protected $_isUtf8;
    /**
     * Privatised to ensure construction is only done internally with getInstance()
     */
    private function __construct() {}
    /**
     * Creates a controller instance of Encoding
     *
     * @param string $encoding
     * @return Encoding
     */
    static public function getInstance($encoding)
    {
        $encoding = strtoupper($encoding); // ensures case insensitivity
        $supportedKey = array_search($encoding, self::$_supported);

        if ($supportedKey === false) {
            throw new OF_Exception('Unsupported encoding specified (' . $encoding . ')');
        }

        $instance = new self();
        $instance->_encoding = $supportedKey;
        // a very common test, result is cached here at construction
        $instance->_isUtf8 = $encoding == 'UTF-8';
        return $instance;
    }
    /**
     * Access to $_isUtf8
     *
     * @return bool
     */
    public function isUtf8()
    {
        return $this->_isUtf8;
    }
    /**
     * Return encoding
     *
     * @return string
     */
    public function get()
    {
        return self::$_supported[$this->_encoding];
    }
    /**
     * @return string
     */
    public function __tostring()
    {
        return __CLASS__ . '(' . $this->get() . ')';
    }
}
Last edited by Ollie Saunders on Mon Aug 28, 2006 3:34 pm, edited 1 time in total.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

I am obviously missing a big something because that looks like an immense about of code for what should be just a setEncoding() method. And I never understand the attraction of statics, such as that "without creating an instance" which creates two objects internally. What's wrong with __construct()? I prefer simpler.
(#10850)
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Is there was something wrong with this code it would be that. But I was thinking about performance you see. I didn't want to have to do string comparisons where I would cache booleans.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

ole wrote:But I was thinking about performance you see. I didn't want to have to do string comparisons where I would cache booleans.
The problem is that the string comparison is probably done only once a request anyway, so you have perhaps harmed performace by adding an assignment and more memory usage due to extra code.
(#10850)
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Uh.

Err...well. Crap.

Really are you sure that is the case?
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

It's test time!
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

I'm failing to do one at the moment.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

OK somebody check this is right. I'm having a brain fuzz at the moment so the logic could be wrong. That's what you get for using silly arrays.

Code: Select all

class Uncached
{
    protected $_encoding;
    public function __construct($enc)
    {
        $this->_encoding = $enc;
    }
    public function get()
    {
        return $this->_encoding;
    }
}
define('TIMES',300);
$time = array();

$time[0]['start'] = microtime();
for ($i=0;$i<TIMES;$i++) {
    $a = new Uncached('UTF-8');
    for ($k=0;$k<TIMES;$k++) {
        if ($a->get() == 'UTF-8');
    }
}
$time[0]['end'] = microtime();


class Cached extends Uncached
{
    protected $_isUtf8;
    public function __construct($enc)
    {
        $this->_encoding = $enc;
        $this->_isUtf8 = $enc == 'UTF-8';
    }
    public function isUtf8()
    {
        return $this->_isUtf8;
    }
}

$time[1]['start'] = microtime();
for ($i=0;$i<TIMES;$i++) {
    $a = new Cached('UTF-8');
    for ($k=0;$k<TIMES;$k++) {
        if ($a->isUtf8());
    }
}
$time[1]['end'] = microtime();

foreach ($time as $k => $v) {
    $time[$k] = elapsed_time_in_milliseconds($v['start'], $v['end']);
}
function elapsed_time_in_milliseconds($start, $end){
    $start = explode(' ', $start);
    $start = $start[0] + $start[1];
    $end = explode(' ', $end);
    $end = $end[0] + $end[1];
    return $end - $start;
}

print_r($time);
if($time[0] < $time[1]) {
    echo 'Uncached is '.round($time[0] / $time[1], 1).' times faster than cached';
} else {
    echo 'Cached is '.round($time[1] / $time[0], 1).' times faster than uncached';
}

Code: Select all

Array ( [0] => 0.160835981369 [1] => 0.124831914902 ) Cached is 0.8 times faster than uncached
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

This thread needs a better (more descriptive) name.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

And now its got one.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Used a > instead of an <. Which means the actual output is:

Code: Select all

Array ( [0] => 0.162651062012 [1] => 0.126176834106 ) Uncached is 1.29 times faster than cached
So I've changed it now:

Code: Select all

class OF_Encoding
{
    static protected $_supported = array('UTF-8', 'ISO-8859-1', 'CP1252');
    /**
     * The encoding to use
     *
     * @var string
     */
    protected $_encoding;
    /**
     * Privatised to ensure construction is only done internally with getInstance()
     */
    private function __construct() {}
    /**
     * Creates a controller instance of Encoding
     *
     * @param string $encoding
     * @return Encoding
     */
    static public function getInstance($encoding)
    {
        $encoding = strtoupper($encoding); // ensures case insensitivity
        if (!in_array($encoding, self::$_supported)) {
            $errStr = 'Unsupported encoding specified (' . $encoding . ')';
            throw new OF_Exception($errStr, OF_Exception::TYPE_SCALAR);
        }
        $instance = new self();
        $instance->_encoding = $encoding;
        return $instance;
    }
    /**
     * @return bool
     */
    public function isUtf8()
    {
        return $this->_encoding == 'UTF-8';
    }
    /**
     * @return bool
     */
    public function isIso88591()
    {
        return $this->_encoding == 'ISO-8859-1';
    }
    /**
     * @return bool
     */
    public function isCp1252()
    {
        return $this->_encoding == 'CP1252';
    }
    /**
     * Return encoding
     *
     * @return string
     */
    public function get()
    {
        return self::$_supported[$this->_encoding];
    }
    /**
     * @return string
     */
    public function __tostring()
    {
        return __CLASS__ . '(' . $this->get() . ')';
    }
}
Perfect now?
Last edited by Ollie Saunders on Mon Aug 28, 2006 5:50 pm, edited 2 times in total.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

So what is a use case?
(#10850)
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

I wrote:

Code: Select all

class OF
{
    << snip >>
    /**
     * Initialize OF without creating an instance
     *
     * @param OF_Encoding $encoding
     */
    static public function init(OF_Encoding $encoding)
    {
        self::$_encoding = $encoding
        if ($encoding->isUtf8()) {
            require_once 'utf8/utf8.php';
        }
   
        self::$registry = new OF_Registry();
        self::$_idRegistry = new OF_Registry(null);
    }
    << snip >>
}

OF::init(OF_Encoding::getInstance('UTF-8'));
And then dotted around the entire library will be things like this:

Code: Select all

if (self::$encoding->isUtf8()) {
   $out.= self::_makeAttrib('class', utf8_bad_replace($this->wrapClass, ''));
} else {
   $out.= self::_makeAttrib('class', $this->wrapClass);
}
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

I guess I am wondering why OF_Encoding is a separate class. It is all of about five lines of code. The information does not really need to be public and the OF class still has to include the utf8.php file so the responsiblities are spread around anyway. Plus you mix string identifiers like 'UTF-8' with named getters. The strings are standard names anyway, just use the strings or maybe define class constants. I would pare it down to me minimum necessary.
(#10850)
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Hehehe. I have a response for every clause. :)
I guess I am wondering why OF_Encoding is a separate class.
Modularization. I don't want to ask OF what encoding it is using when I can ask a dedicated object.

Code: Select all

OF::$encoding->isUtf8(); // is more readable than...
OF::isUtf8();
Its very easy for me to junk loads of stuff into OF. Its already got quite a lot of random extras. Besides a dedicated object scales a lot better, can be inherited, I can have more than one instance if I desire etc.
It is all of about five lines of code.
All the best classes are small. Why is that a problem?
The information does not really need to be public
What Utf8()? No I need that to be public besides what would I be hiding if I made it protected or private the object was created with a string that describes the encoding so that information is already public knowledge.
the OF class still has to include the utf8.php file so the responsiblities are spread around anyway
That's a library for handling UTF-8, its very large and performance oriented. I tried to lump it into a big class but it just got massive and the parsing overhead got a little silly so instead (and this is what you are supposed to do) you include the things you need as you need them.
Plus you mix string identifiers like 'UTF-8' with named getters.
So?
just use the strings
Do this you mean?

Code: Select all

if (OF::$encoding == 'UTF-8')
and risk the chance of me typing a single character wrong in that second string when I can do this

Code: Select all

if (OF::$encoding->isUtf8())
where PHP will shout at me if I get the function wrong and I don't have to think about the formatting of the encoding name, which I, and many other people, are not particularly familiar with.
or maybe define class constants
I thought at first that constants would provide the solution. They don't. They are one-way only. If I have 'UTF-8' stored in a constant there is no way to work out the name of the constant that give me that string without searching through the output of get_defined_constants().
I would pare it down to me minimum necessary.
I would write it in a way that gives me most control, flexibility and extendibility. These are things that are much more important than simplicity alone. That's not to say my solution isn't simple because it is. Like you said its only about 5 lines of code.
Post Reply