[SOLVED] Help with using &#codes in new Option

JavaScript and client side scripting.

Moderator: General Moderators

Post Reply
User avatar
Pyrite
Forum Regular
Posts: 769
Joined: Tue Sep 23, 2003 11:07 pm
Location: The Republic of Texas
Contact:

[SOLVED] Help with using &#codes in new Option

Post by Pyrite »

So yea, I have like utf-8 data in a database, but since it's MySQL 4.0, and NOT 4.1, some data gets encoded and stored as &#somenumber; (dunno what you call those, first question?). If I document.write it, it works fine. But if I use the new Option object to insert it into a select box, the select just shows the &#number and not the utf-8 character. Any ideas?
Last edited by Pyrite on Sun Aug 07, 2005 9:22 am, edited 1 time in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Re: Help with using &#codes in new Option

Post by feyd »

Pyrite wrote:So yea, I have like utf-8 data in a database, but since it's MySQL 4.0, and NOT 4.1, some data gets encoded and stored as &#somenumber; (dunno what you call those, first question?). If I document.write it, it works fine. But if I use the new Option object to insert it into a select box, the select just shows the &#number and not the utf-8 character. Any ideas?
  1. they are called entities
  2. it should be simple enough to create a simple mapping/algorithm to convert them back to UTF-8 binary data..
If you'd like, I can see if I can help or cook-up an algorithm...
User avatar
Pyrite
Forum Regular
Posts: 769
Joined: Tue Sep 23, 2003 11:07 pm
Location: The Republic of Texas
Contact:

Post by Pyrite »

Would love anything you'll throw my way ...

Actually, just found this function now that I knew what to search for. Works perfectly for me. Khap Khun Maak Khrap! :D

http://www.zend.com/codex.php?id=838&single=1
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

his is rather large compared to mine, which contains unit tests along with conformance to UNICODE 4.1.0. The following was tested on PHP 5.0.4

Code: Select all

<?php

/*******************************************************************************
 * This code references:
 *------------------------------------------------------------------------------
 * The Unicode Consortium. The Unicode Standard, Version 4.1.0, defined by:
 * The Unicode Standard, Version 4.0 (Boston, MA, Addison-Wesley, 2003.
 * ISBN 0-321-18578-1), as amended by
 * Unicode 4.0.1 (http://www.unicode.org/versions/Unicode4.0.1) and by
 * Unicode 4.1.0 (http://www.unicode.org/versions/Unicode4.1.0).
 */

function makeUTF8($match)
{
  if(is_array($match))
  {
    $ret = makeUTF8($match[1]);
    if($ret === false)
    { //  the value is not a valid unicode character.
      return $match[0];
    }
    else
    {
      return $ret;
    }
  }
  else
  {
    $code = intval($match);
    //  +-----------------+-----------------+-----------------+-----------------+
    //  | 3 3 2 2 2 2 2 2 | 2 2 2 2 1 1 1 1 | 1 1 1 1 1 1     |                 |
    //  | 1 0 9 8 7 6 5 4 | 3 2 1 0 9 8 7 6 | 5 4 3 2 1 0 9 8 | 7 6 5 4 3 2 1 0 | bit
    //  +-----------------+-----------------+-----------------+-----------------+
    //  |                 |                 |                 | 0 x x x x x x x | 1 byte 0x00000000..0x0000007F
    //  |                 |                 | 1 1 0 y y y y y | 1 0 x x x x x x | 2 byte 0x00000080..0x000007FF
    //  |                 | 1 1 1 0 z z z z | 1 0 y y y y y y | 1 0 x x x x x x | 3 byte 0x00000800..0x0000FFFF
    //  | 1 1 1 1 0 w w w | 1 0 w w z z z z | 1 0 y y y y y y | 1 0 x x x x x x | 4 byte 0x00010000..0x0010FFFF
    //  +-----------------+-----------------+-----------------+-----------------+
    //  | 0 0 0 0 0 0 0 0 | 0 0 0 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | Theoretical upper limit of legal scalars: 2097151 (0x001FFFFF)
    //  | 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 | 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | Defined upper limit of legal scalar codes
    //  +-----------------+-----------------+-----------------+-----------------+
    if($code > 1114111 or $code < 0 or ($code >= 55296 and $code <= 57343))
    { //  bits are set outside the "valid" range as defined by UNICODE 4.1.0
      return false;
    }
    else
    {
      $x = $y = $z = $w = 0;
      if($code < 128)
      {
        $x = $code;
      }
      else
      {
        $x = ($code & 63) | 128;
        if($code < 2048)
        {
          $y = (($code & 2047) >> 6) | 192;
        }
        else
        {
          $y = (($code & 4032) >> 6) | 128;
          if($code < 65536)
          {
            $z = (($code >> 12) & 15) | 224;
          }
          else
          {
            $z = (($code >> 12) & 63) | 128;
            $w = (($code >> 18) & 7)  | 240;
          }
        }
      }

      $ret = '';
      if($w)
      {
        $ret = chr($w).chr($z).chr($y);
      }
      elseif($z)
      {
        $ret = chr($z).chr($y);
      }
      elseif($y)
      {
        $ret = chr($y);
      }
      $ret .= chr($x);

      return $ret;
    }
  }
}

// test stuff from here on, pretty much...

function hexerize($string)
{
  $ret = '';
  for($i = 0, $j = strlen($string); $i < $j; $i++)
  {
    $ret .= sprintf('%02X',ord($string{$i}));
  }
  return $ret;
}

function utfTest($code, $expectedReturn, $expectedPass = true)
{
  $expect = ($expectedPass ? 'pass' : 'fail');
  $ret = 'Expecting '.$expect.': ';

  $utf = makeUTF8($code);
  if(is_string($expectedReturn))
  {
    $hex = hexerize($utf);
    $test = ($hex === $expectedReturn);
    $hex = '('.$hex.')';
  }
  else
  {
    $hex = '';
    $test = ($utf === $expectedReturn);
  }
  if($test)
  {
    $ret .= 'pass';
    if(!$expectedPass)
    {
      $ret .= "\n\t".var_export($utf,true).$hex.' == '.var_export($expectedReturn,true);
    }
  }
  else
  {
    $ret .= 'fail';
    if($expectedPass)
    { //  the run failed, output the returns
      $ret .= "\n\t".var_export($utf,true).$hex.' != '.var_export($expectedReturn,true);
    }
  }

  return array($test === $expectedPass,$ret);
}

function testUTF()
{
  $results = array();
  $results['passed'] = array();
  $results['failed'] = array();

  $args = array();
  $args[] = array(1114112,false     );
  $args[] = array(1114111,'F48FBFBF'); // 0x0010FFFF
  $args[] = array(1048576,'F4808080'); // 0x00100000
  $args[] = array(1048575,'F3BFBFBF'); // 0x000FFFFF
  $args[] = array(262144, 'F1808080'); // 0x00040000
  $args[] = array(262143, 'F0BFBFBF'); // 0x0003FFFF
  $args[] = array(65536,  'F0908080'); // 0x00010000
  $args[] = array(65535,  'EFBFBF'  ); // 0x0000FFFF
  $args[] = array(57344,  'EE8080'  ); // 0x0000E000
  $args[] = array(57343,  false     ); // 0x0000DFFF  these are ill-formed
  $args[] = array(56040,  false     ); // 0x0000DAE8  these are ill-formed
  $args[] = array(55296,  false     ); // 0x0000D800  these are ill-formed
  $args[] = array(55295,  'ED9FBF'  ); // 0x0000D7FF
  $args[] = array(53248,  'ED8080'  ); // 0x0000D000
  $args[] = array(53247,  'ECBFBF'  ); // 0x0000CFFF
  $args[] = array(4096,   'E18080'  ); // 0x00001000
  $args[] = array(4095,   'E0BFBF'  ); // 0x00000FFF
  $args[] = array(2048,   'E0A080'  ); // 0x00000800
  $args[] = array(2047,   'DFBF'    ); // 0x000007FF
  $args[] = array(128,    'C280'    ); // 0x00000080
  $args[] = array(127,    '7F'      ); // 0x0000007F
  $args[] = array(0,      '00'      ); // 0x00000000

  $args[] = array(20108,  'E4BA8C'  ); // 0x00004E8C
  $args[] = array(77,     '4D'      ); // 0x0000004D
  $args[] = array(66306,  'F0908C82'); // 0x00010302
  $args[] = array(1072,   'D0B0'    ); // 0x00000430

  foreach($args as $argList)
  {
    list($pass,$ret) = call_user_func_array('utfTest',$argList);
    $results[$pass ? 'passed' : 'failed'][] = $ret;
  }

  if(count($results['failed']))
  {
    echo "One or more tests failed:\n";
    echo implode("\n",$results['failed']);
  }
  else
  {
    echo "All tests passed.\n";
  }
}

//testUTF();

echo '<pre>Before:
'.htmlentities(var_export($text,true)).'
</pre>';

$text = preg_replace_callback('/&#([0-9]+?);/','makeUTF8',$text);

echo '<pre>After:
'.htmlentities(var_export($text,true)).'
</pre>';

?>
obviously enough, to run the unit tests, uncomment line 196 (the call to testUTF())
User avatar
Pyrite
Forum Regular
Posts: 769
Joined: Tue Sep 23, 2003 11:07 pm
Location: The Republic of Texas
Contact:

Post by Pyrite »

Hmmm, well I don't understand your code, but it doesn't work for me.

If I do this to populate my select (with his code) it works:

Code: Select all

$i = 0;
	while (!$step2->EOF) {
		$lid = $step2->fields[0];
		$lnm = utf8Encode($step2->fields[1]);
		?>
		document.forms['frmEnroll'].step2.options[<?=$i;?>] = new Option('<?=$lnm;?>','<?=$lid;?>');
		<?php
		$i++;
		$step2->MoveNext();
	}
But with yours:

Code: Select all

$i = 0;
	while (!$step2->EOF) {
		$lid = $step2->fields[0];
		$lnm = MakeUTF8($step2->fields[1]);
		?>
		document.forms['frmEnroll'].step2.options[<?=$i;?>] = new Option('<?=$lnm;?>','<?=$lid;?>');
		<?php
		$i++;
		$step2->MoveNext();
	}
All I get is blank lines in the select. Hmmm.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

makeUTF8() only does translation from a code to UTF8, it doesn't parse a text for you.. that's why I placed the preg_replace_callback() call at the bottom... :)
Post Reply