PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Sun Sep 22, 2019 5:51 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Sat Aug 06, 2005 9:16 pm 
Offline
Forum Regular
User avatar

Joined: Tue Sep 23, 2003 11:07 pm
Posts: 769
Location: The Republic of Texas
So yea, I have like utf-8 data in a database, but since it's MySQL 4.0, and NOT 4.1, some data gets encoded and stored as &#somenumber; (dunno what you call those, first question?). If I document.write it, it works fine. But if I use the new Option object to insert it into a select box, the select just shows the &#number and not the utf-8 character. Any ideas?


Last edited by Pyrite on Sun Aug 07, 2005 9:22 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Sat Aug 06, 2005 9:28 pm 
Offline
Neighborhood Spidermoddy
User avatar

Joined: Mon Mar 29, 2004 4:24 pm
Posts: 31559
Location: Bothell, Washington, USA


Top
 Profile  
 
 Post subject:
PostPosted: Sat Aug 06, 2005 9:36 pm 
Offline
Forum Regular
User avatar

Joined: Tue Sep 23, 2003 11:07 pm
Posts: 769
Location: The Republic of Texas
Would love anything you'll throw my way ...

Actually, just found this function now that I knew what to search for. Works perfectly for me. Khap Khun Maak Khrap! :D



Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 07, 2005 10:47 am 
Offline
Neighborhood Spidermoddy
User avatar

Joined: Mon Mar 29, 2004 4:24 pm
Posts: 31559
Location: Bothell, Washington, USA
his is rather large compared to mine, which contains unit tests along with conformance to UNICODE 4.1.0. The following was tested on PHP 5.0.4
Syntax: [ Download ] [ Hide ]
<?php



/*******************************************************************************

 * This code references:

 *------------------------------------------------------------------------------

 * The Unicode Consortium. The Unicode Standard, Version 4.1.0, defined by:

 * The Unicode Standard, Version 4.0 (Boston, MA, Addison-Wesley, 2003.

 * ISBN 0-321-18578-1), as amended by

 * Unicode 4.0.1 (http://www.unicode.org/versions/Unicode4.0.1) and by

 * Unicode 4.1.0 (http://www.unicode.org/versions/Unicode4.1.0).

 */




function makeUTF8($match)

{

  if(is_array($match))

  {

    $ret = makeUTF8($match[1]);

    if($ret === false)

    { //  the value is not a valid unicode character.

      return $match[0];

    }

    else

    {

      return $ret;

    }

  }

  else

  {

    $code = intval($match);

    //  +-----------------+-----------------+-----------------+-----------------+

    //  | 3 3 2 2 2 2 2 2 | 2 2 2 2 1 1 1 1 | 1 1 1 1 1 1     |                 |

    //  | 1 0 9 8 7 6 5 4 | 3 2 1 0 9 8 7 6 | 5 4 3 2 1 0 9 8 | 7 6 5 4 3 2 1 0 | bit

    //  +-----------------+-----------------+-----------------+-----------------+

    //  |                 |                 |                 | 0 x x x x x x x | 1 byte 0x00000000..0x0000007F

    //  |                 |                 | 1 1 0 y y y y y | 1 0 x x x x x x | 2 byte 0x00000080..0x000007FF

    //  |                 | 1 1 1 0 z z z z | 1 0 y y y y y y | 1 0 x x x x x x | 3 byte 0x00000800..0x0000FFFF

    //  | 1 1 1 1 0 w w w | 1 0 w w z z z z | 1 0 y y y y y y | 1 0 x x x x x x | 4 byte 0x00010000..0x0010FFFF

    //  +-----------------+-----------------+-----------------+-----------------+

    //  | 0 0 0 0 0 0 0 0 | 0 0 0 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | Theoretical upper limit of legal scalars: 2097151 (0x001FFFFF)

    //  | 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 | 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | Defined upper limit of legal scalar codes

    //  +-----------------+-----------------+-----------------+-----------------+

    if($code > 1114111 or $code < 0 or ($code >= 55296 and $code <= 57343))

    { //  bits are set outside the "valid" range as defined by UNICODE 4.1.0

      return false;

    }

    else

    {

      $x = $y = $z = $w = 0;

      if($code < 128)

      {

        $x = $code;

      }

      else

      {

        $x = ($code & 63) | 128;

        if($code < 2048)

        {

          $y = (($code & 2047) >> 6) | 192;

        }

        else

        {

          $y = (($code & 4032) >> 6) | 128;

          if($code < 65536)

          {

            $z = (($code >> 12) & 15) | 224;

          }

          else

          {

            $z = (($code >> 12) & 63) | 128;

            $w = (($code >> 18) & 7)  | 240;

          }

        }

      }



      $ret = '';

      if($w)

      {

        $ret = chr($w).chr($z).chr($y);

      }

      elseif($z)

      {

        $ret = chr($z).chr($y);

      }

      elseif($y)

      {

        $ret = chr($y);

      }

      $ret .= chr($x);



      return $ret;

    }

  }

}



// test stuff from here on, pretty much...



function hexerize($string)

{

  $ret = '';

  for($i = 0, $j = strlen($string); $i < $j; $i++)

  {

    $ret .= sprintf('%02X',ord($string{$i}));

  }

  return $ret;

}



function utfTest($code, $expectedReturn, $expectedPass = true)

{

  $expect = ($expectedPass ? 'pass' : 'fail');

  $ret = 'Expecting '.$expect.': ';



  $utf = makeUTF8($code);

  if(is_string($expectedReturn))

  {

    $hex = hexerize($utf);

    $test = ($hex === $expectedReturn);

    $hex = '('.$hex.')';

  }

  else

  {

    $hex = '';

    $test = ($utf === $expectedReturn);

  }

  if($test)

  {

    $ret .= 'pass';

    if(!$expectedPass)

    {

      $ret .= "\n\t".var_export($utf,true).$hex.' == '.var_export($expectedReturn,true);

    }

  }

  else

  {

    $ret .= 'fail';

    if($expectedPass)

    { //  the run failed, output the returns

      $ret .= "\n\t".var_export($utf,true).$hex.' != '.var_export($expectedReturn,true);

    }

  }



  return array($test === $expectedPass,$ret);

}



function testUTF()

{

  $results = array();

  $results['passed'] = array();

  $results['failed'] = array();



  $args = array();

  $args[] = array(1114112,false     );

  $args[] = array(1114111,'F48FBFBF'); // 0x0010FFFF

  $args[] = array(1048576,'F4808080'); // 0x00100000

  $args[] = array(1048575,'F3BFBFBF'); // 0x000FFFFF

  $args[] = array(262144, 'F1808080'); // 0x00040000

  $args[] = array(262143, 'F0BFBFBF'); // 0x0003FFFF

  $args[] = array(65536,  'F0908080'); // 0x00010000

  $args[] = array(65535,  'EFBFBF'  ); // 0x0000FFFF

  $args[] = array(57344,  'EE8080'  ); // 0x0000E000

  $args[] = array(57343,  false     ); // 0x0000DFFF  these are ill-formed

  $args[] = array(56040,  false     ); // 0x0000DAE8  these are ill-formed

  $args[] = array(55296,  false     ); // 0x0000D800  these are ill-formed

  $args[] = array(55295,  'ED9FBF'  ); // 0x0000D7FF

  $args[] = array(53248,  'ED8080'  ); // 0x0000D000

  $args[] = array(53247,  'ECBFBF'  ); // 0x0000CFFF

  $args[] = array(4096,   'E18080'  ); // 0x00001000

  $args[] = array(4095,   'E0BFBF'  ); // 0x00000FFF

  $args[] = array(2048,   'E0A080'  ); // 0x00000800

  $args[] = array(2047,   'DFBF'    ); // 0x000007FF

  $args[] = array(128,    'C280'    ); // 0x00000080

  $args[] = array(127,    '7F'      ); // 0x0000007F

  $args[] = array(0,      '00'      ); // 0x00000000



  $args[] = array(20108,  'E4BA8C'  ); // 0x00004E8C

  $args[] = array(77,     '4D'      ); // 0x0000004D

  $args[] = array(66306,  'F0908C82'); // 0x00010302

  $args[] = array(1072,   'D0B0'    ); // 0x00000430



  foreach($args as $argList)

  {

    list($pass,$ret) = call_user_func_array('utfTest',$argList);

    $results[$pass ? 'passed' : 'failed'][] = $ret;

  }



  if(count($results['failed']))

  {

    echo "One or more tests failed:\n";

    echo implode("\n",$results['failed']);

  }

  else

  {

    echo "All tests passed.\n";

  }

}



//testUTF();



echo '<pre>Before:

'
.htmlentities(var_export($text,true)).'

</pre>'
;



$text = preg_replace_callback('/&#([0-9]+?);/','makeUTF8',$text);



echo '<pre>After:

'
.htmlentities(var_export($text,true)).'

</pre>'
;



?>


obviously enough, to run the unit tests, uncomment line 196 (the call to testUTF())


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 07, 2005 11:48 am 
Offline
Forum Regular
User avatar

Joined: Tue Sep 23, 2003 11:07 pm
Posts: 769
Location: The Republic of Texas
Hmmm, well I don't understand your code, but it doesn't work for me.

If I do this to populate my select (with his code) it works:

Syntax: [ Download ] [ Hide ]
$i = 0;

        while (!$step2->EOF) {

                $lid = $step2->fields[0];

                $lnm = utf8Encode($step2->fields[1]);

                ?>

                document.forms['frmEnroll'].step2.options[<?=$i;?>] = new Option('<?=$lnm;?>','<?=$lid;?>');

                <?php

                $i++;

                $step2->MoveNext();

        }


But with yours:

Syntax: [ Download ] [ Hide ]
$i = 0;

        while (!$step2->EOF) {

                $lid = $step2->fields[0];

                $lnm = MakeUTF8($step2->fields[1]);

                ?>

                document.forms['frmEnroll'].step2.options[<?=$i;?>] = new Option('<?=$lnm;?>','<?=$lid;?>');

                <?php

                $i++;

                $step2->MoveNext();

        }


All I get is blank lines in the select. Hmmm.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 07, 2005 2:27 pm 
Offline
Neighborhood Spidermoddy
User avatar

Joined: Mon Mar 29, 2004 4:24 pm
Posts: 31559
Location: Bothell, Washington, USA


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group