Custom serialization functions

Small, short code snippets that other people may find useful. Do you have a good regex that you would like to share? Share it! Even better, the code can be commented on, and improved.

Moderator: General Moderators

Post Reply
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Custom serialization functions

Post by Gambler »

Yet another part of my framework, but it's loosely-copuled.

Code: Select all

<?php
class Ori{
    const UNBRACE = 1;
    const BRACE = 2;
    const NO_NULLS = 4;
    const NO_EMPTY = 8;
    const NO_WORDS = 16;
    const READABLE = 32;
    
    static function fold($var, $flags = 0, $level = 0){
        if ($flags & self::BRACE && $flags & self::UNBRACE) {
            $brace = FALSE; //unbrace takes precedence
        } elseif ($flags & self::BRACE) {
            $brace = TRUE;
        } elseif ($flags & self::UNBRACE) {
            $brace = FALSE;
        } else {
            $brace = NULL;
        }
        $skipNull = $flags & self::NO_NULLS;
        $skipEmpty = $flags & self::NO_EMPTY;
        $useWords = !($flags & self::NO_WORDS);
        $readable = $flags & self::READABLE;
        $code = '';
        
        if ($skipEmpty && empty($var)) {
            return '';
        }
        switch(gettype($var)) {
            case 'array':
            if ($level > 0 || $brace === TRUE || ($brace === NULL && sizeof($var) <= 1)) {
                $code .= '(';
                if ($readable) { $code .= "\n"; } 
            } else {
                $code .= '';
            }
            
            $defaultKey = 0;
            foreach ($var as $key => $value) {
                if ($skipEmpty && empty($value)
                or $skipNull && $value === NULL) {
                    if (is_int($key) && $key >= $defaultKey) {
                        $defaultKey = $key + 1; //keep track of index even if value is skipped
                    }
                    continue;
                }
                if ($defaultKey === $key) {
                    if ($readable) { $code .= str_repeat(' ', 4 * $level); } 
                    $code .= self::fold($value, $flags | self::BRACE, $level + 1).',';
                    if ($readable) { $code .= "\n"; } 
                    ++$defaultKey;
                } else {
                    if ($readable) { $code .= str_repeat(' ', 4 * $level); } 
                    $eq = '=';
                    if ($readable) {
                        $eq = ' = ';
                    }
                    $code .= self::fold($key, $flags | self::BRACE, $level + 1)
                        .$eq.self::fold($value, $flags | self::BRACE, $level + 1).',';
                    if ($readable) { $code .= "\n"; }
                }
            }
            if (!$readable || $level == 0) {
                $code = chop($code, ",\n"); //remove unnecessary coma
            }
            if ($level > 0 || $brace === NULL && sizeof($var) <= 1) {
                if ($readable) { $code .= str_repeat(' ', 4 * ($level - 1)); }
                $code .= ')';
            } elseif ($brace === TRUE) {
                if ($readable) { $code .= str_repeat(' ', 4 * ($level - 1)); }
                $code .= ')';
            } else { //$brace === FALSE
               //do nothing
            }
            return $code;
            break;
        
            case 'integer':
            case 'double':
            return $var;
            break;

            case 'string':
            if (!empty($var) && ctype_alnum($var) && $useWords) {
                return $var;
            } else {
                if (strpos($var, "'") !== FALSE) {
                    $var = str_replace("'", "''", $var);
                }
                return "'".$var."'";
            }
            break;

            case 'boolean':
            return ($var ? '+' : '-');
            break;

            default:
            if ($skipNull) {
                return '';
            } else {
                return '~';
            }
            //break;
        }
    }
    
    static function unfold($str, $flags = 0){
        $brace = $flags & self::BRACE;
        
        if (empty($str)) { //could save some time
            if ($brace) {
                return array();
            } else {
                return NULL;
            }
        }
        
        $matches = array();
        preg_match_all("/'(.*?)'/s", $str, $matches); //find all quoted strings
        $sStack = $matches[1];
        $str = preg_replace("/'.*?'/s", "&", $str); //cut them out
    
        if (strpos($str, "'")) {
            user_error("Unpaired quotes", E_USER_WARNING);
            return;
        }
        
        $str = preg_replace('/\s/', '', $str); //cut out whitespace
        $str = str_replace(',)', ')', $str); //normalize array closures
    
        preg_match_all('/([^\&\=\,\(\)]+)/', $str, $matches); //find everything else
        $xStack = $matches[1];
        $str = preg_replace('/[^\&\=\,\(\)]+/', '#', $str); //cut it out
        
        if ($brace || ($str{0} != '(' && strpos($str, ","))) {
            $str = '('.$str.')';
        }
        
        $heap = array();
        $ptr = strlen($str) - 1;
        $scalar = TRUE;
        
        while ($ptr >= 0) { //tokenize
            if ($str{$ptr} == '&') { //is a quoted string
                $string = array_pop($sStack);
                while ($ptr > 0 && $str{$ptr - 1} == '&') {
                    $string = array_pop($sStack)."'".$string; // resolve escaped single quote
                    $str{$ptr} = '_';
                    --$ptr;
                }
                $str{$ptr} = '$';
                $heap[$ptr] = $string;
            } elseif ($str{$ptr} == '#') {
                $value = array_pop($xStack);
                $str{$ptr} = '$';
                
                if ($value == '+') {
                    $heap[$ptr] = TRUE;
                } elseif ($value == '-') {
                    $heap[$ptr] = FALSE;
                } elseif ($value == '~') {
                    $heap[$ptr] = NULL;
                } elseif (is_numeric($value)) { //it is a number
                    $heap[$ptr] = $value + 0;
                } else { //is an unquoted string (hopefully)
                    $heap[$ptr] = (string) $value;
                }
            } else { //is an array markup character
                $scalar = FALSE;
            }
            --$ptr;
        }
        
        if ($scalar) {// if it's a single value do not bother with array processing
            return $heap[0];
        }
        
        while (($aStart = strrpos($str, '(')) !== FALSE) { //reduce to a single value
            $aEnd = strpos($str, ')', $aStart);
            if ($aEnd === FALSE) {
                user_error("Array beginning at [$aStart] is not closed", E_USER_WARNING);
                return;
            }
            
            $str{$aStart} = '$'; //array will be reduced to a single variable
            $str{$aEnd} = ','; //comma serves as a trigger, so array should now end with one
            
            if ($aStart == $aEnd - 1) { //empty array?
                $str{$aEnd} = '_';
                $heap[$aStart] = array();
                continue;
            }
            
            $ptr = $aStart + 1;
            if ($str{$ptr} == ',' || $str{$ptr} == '=') {
                user_error("Invalid array entry [$ptr]", E_USER_WARNING);
                return;
            }
            
            $aStack = array();
            while ($ptr < $aEnd) {
                switch ($str{$ptr}) {
                    case '=':
                    $str{$ptr} = '_';
                    $keyPtr = $ptr - 1;
                    while ($str{$keyPtr} != '$') {
                        if ($str{$keyPtr} != '_') {
                            user_error("Invalid character sequence in array [$keyPtr]", E_USER_WARNING);
                            return;
                        }
                        $str{$keyPtr} = '_';
                        --$keyPtr;
                    }
                    $str{$keyPtr} = '_';
                    $valPtr = $ptr + 1;
                    if ($str{$valPtr} != '$') {
                        user_error("Invalid character sequence in array [$keyPtr]", E_USER_WARNING);
                        return;
                    }
                    $str{$valPtr} = '_';
                    $aStack[$heap[$keyPtr]]= $heap[$valPtr];
                    unset($heap[$keyPtr], $heap[$valPtr]);
                    $commaPtr = $valPtr + 1;
                    while ($str{$commaPtr} != ',') {
                        if ($str{$commaPtr} != '_') {
                            user_error("Invalid character sequence in array [$commaPtr]", E_USER_WARNING);
                            return;
                        }
                        ++$commaPtr;
                    }
                    $str{$commaPtr} = '_';
                    $ptr = $commaPtr;
                    break;
                    
                    case ',':
                    $str{$ptr} = '_';
                    $valPtr = $ptr - 1;
                    while ($str{$valPtr} != '$') {
                        if ($str{$valPtr} != '_') {
                            user_error("Invalid character sequence in array [$valPtr]", E_USER_WARNING);
                            return;
                        }
                        $str{$valPtr} = '_';
                        --$valPtr;
                    }
                    $str{$valPtr} = '_';
                    $aStack[] = $heap[$valPtr];
                    unset($heap[$valPtr]);
                    break;
                    
                    //do nothing on default
                }
                ++$ptr;
            }
            $heap[$aStart] = $aStack;
        }
        
        return $heap[0]; //everything should have been reduced to a single value
    }
}
?>
Not really a serialization, since it does not work with objects, but more like a data storage function. It's output is highly compact, and human-editable. Has various formatting options. Reasonably fast too.

If you want a way for your users to pass complex data structures to your script, it might be quite useful. Instead of writing horribly complex GUI with gadzillion of fields and (possibly) tons of javascripts, you can use a single textarea to create an array of any complexity.
Last edited by Gambler on Fri Jun 23, 2006 2:38 pm, edited 2 times in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

example input and output?
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Post by Gambler »

Code: Select all

array(
    'name' => 'O\'Neal',
    'occupation' => 'webdev',
    'attrs' => array(
        'bla' => 12,
        'isSomething' => FALSE,
    ),
    'groups' => array('x', 'y', 'z'),
    'params' => array(TRUE, FALSE, NULL, 1, -12.3),
)
becomes

Code: Select all

name='O''Neal',occupation=webdev,attrs=(bla=12,isSomething=!),groups=(x,y,z),params=(*,!,~,1,-12.3)
Compare with serialize:

Code: Select all

a:5:{s:4:"name";s:6:"O'Neal";s:10:"occupation";s:6:"webdev";s:5:"attrs";
a:2:{s:3:"bla";i:12;s:11:"isSomething";b:0;}s:6:"groups";a:3:{i:0;s:1:"x";i:1;s:1:
"y";i:2;s:1:"z";}s:6:"params";a:5:{i:0;b:1;i:1;b:0;i:2;N;i:3;i:1;i:4;
d:-12.300000000000000710542735760100185871124267578125;}}
(I nserted few spaces so this line wouldn't stretch the page.)

Also, there are several flags that tweak my funtions' behavior. You can, for example, order it to skip nulls, or outpur formatted code with indentation and newlines.
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Post by Gambler »

Updated. Should be a bit faster now, and should use + and - for TRUE and FALSE for the sake of readability.

BTW, I forgot to tell that it uses smart indexing algorithm.

So

Code: Select all

array(TRUE, FALSE, 16=>NULL, 1, '12\'9\"dsd', -12.01, array())
becomes

Code: Select all

+,-,16=~,1,'12''9dsd',-12.01,()
Last edited by Gambler on Thu Jun 01, 2006 3:54 pm, edited 1 time in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

the string looks like a typo/error in your new example.
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Post by Gambler »

Ups, the original values are

Code: Select all

array(TRUE, FALSE, 16=>NULL, 1, '12\'9dsd', -12.01, array())
Thanks for checking.
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Post by Gambler »

Update. Added NO_EMPTY option, which removes all empty variables and keys. Tweaked for a tiny bit of performance. Also, since the class shrinked significantly from it's original version, I'm posting it here directly, rather than giving a link.

Had anyone exept me tried using this thing for anything? I'm thinking about improving it in two ways.
1) Maybe I should distinguish between hashes {} and arrays []? This might be a stepping stone on the way towars consistent, language-independent sytax. Usually hashes and arrays are not the same thing.
2) Maybe I should add support for marshalled objects? This wold require objects to implement marshall and unmarchall methods.

Edit: On the second though, these changes will make syntax more comples, and parsing speed will drop.
Post Reply