Page 1 of 1
[SOLVED] Help with using &#codes in new Option
Posted: Sat Aug 06, 2005 9:16 pm
by Pyrite
So yea, I have like utf-8 data in a database, but since it's MySQL 4.0, and NOT 4.1, some data gets encoded and stored as &#somenumber; (dunno what you call those, first question?). If I document.write it, it works fine. But if I use the new Option object to insert it into a select box, the select just shows the &#number and not the utf-8 character. Any ideas?
Re: Help with using &#codes in new Option
Posted: Sat Aug 06, 2005 9:28 pm
by feyd
Pyrite wrote:So yea, I have like utf-8 data in a database, but since it's MySQL 4.0, and NOT 4.1, some data gets encoded and stored as &#somenumber; (dunno what you call those, first question?). If I document.write it, it works fine. But if I use the new Option object to insert it into a select box, the select just shows the &#number and not the utf-8 character. Any ideas?
- they are called entities
- it should be simple enough to create a simple mapping/algorithm to convert them back to UTF-8 binary data..
If you'd like, I can see if I can help or cook-up an algorithm...
Posted: Sat Aug 06, 2005 9:36 pm
by Pyrite
Would love anything you'll throw my way ...
Actually, just found this function now that I knew what to search for. Works perfectly for me. Khap Khun Maak Khrap!
http://www.zend.com/codex.php?id=838&single=1
Posted: Sun Aug 07, 2005 10:47 am
by feyd
his is rather large compared to mine, which contains unit tests along with conformance to UNICODE 4.1.0. The following was tested on PHP 5.0.4
Code: Select all
<?php
/*******************************************************************************
* This code references:
*------------------------------------------------------------------------------
* The Unicode Consortium. The Unicode Standard, Version 4.1.0, defined by:
* The Unicode Standard, Version 4.0 (Boston, MA, Addison-Wesley, 2003.
* ISBN 0-321-18578-1), as amended by
* Unicode 4.0.1 (http://www.unicode.org/versions/Unicode4.0.1) and by
* Unicode 4.1.0 (http://www.unicode.org/versions/Unicode4.1.0).
*/
function makeUTF8($match)
{
if(is_array($match))
{
$ret = makeUTF8($match[1]);
if($ret === false)
{ // the value is not a valid unicode character.
return $match[0];
}
else
{
return $ret;
}
}
else
{
$code = intval($match);
// +-----------------+-----------------+-----------------+-----------------+
// | 3 3 2 2 2 2 2 2 | 2 2 2 2 1 1 1 1 | 1 1 1 1 1 1 | |
// | 1 0 9 8 7 6 5 4 | 3 2 1 0 9 8 7 6 | 5 4 3 2 1 0 9 8 | 7 6 5 4 3 2 1 0 | bit
// +-----------------+-----------------+-----------------+-----------------+
// | | | | 0 x x x x x x x | 1 byte 0x00000000..0x0000007F
// | | | 1 1 0 y y y y y | 1 0 x x x x x x | 2 byte 0x00000080..0x000007FF
// | | 1 1 1 0 z z z z | 1 0 y y y y y y | 1 0 x x x x x x | 3 byte 0x00000800..0x0000FFFF
// | 1 1 1 1 0 w w w | 1 0 w w z z z z | 1 0 y y y y y y | 1 0 x x x x x x | 4 byte 0x00010000..0x0010FFFF
// +-----------------+-----------------+-----------------+-----------------+
// | 0 0 0 0 0 0 0 0 | 0 0 0 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | Theoretical upper limit of legal scalars: 2097151 (0x001FFFFF)
// | 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 | 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | Defined upper limit of legal scalar codes
// +-----------------+-----------------+-----------------+-----------------+
if($code > 1114111 or $code < 0 or ($code >= 55296 and $code <= 57343))
{ // bits are set outside the "valid" range as defined by UNICODE 4.1.0
return false;
}
else
{
$x = $y = $z = $w = 0;
if($code < 128)
{
$x = $code;
}
else
{
$x = ($code & 63) | 128;
if($code < 2048)
{
$y = (($code & 2047) >> 6) | 192;
}
else
{
$y = (($code & 4032) >> 6) | 128;
if($code < 65536)
{
$z = (($code >> 12) & 15) | 224;
}
else
{
$z = (($code >> 12) & 63) | 128;
$w = (($code >> 18) & 7) | 240;
}
}
}
$ret = '';
if($w)
{
$ret = chr($w).chr($z).chr($y);
}
elseif($z)
{
$ret = chr($z).chr($y);
}
elseif($y)
{
$ret = chr($y);
}
$ret .= chr($x);
return $ret;
}
}
}
// test stuff from here on, pretty much...
function hexerize($string)
{
$ret = '';
for($i = 0, $j = strlen($string); $i < $j; $i++)
{
$ret .= sprintf('%02X',ord($string{$i}));
}
return $ret;
}
function utfTest($code, $expectedReturn, $expectedPass = true)
{
$expect = ($expectedPass ? 'pass' : 'fail');
$ret = 'Expecting '.$expect.': ';
$utf = makeUTF8($code);
if(is_string($expectedReturn))
{
$hex = hexerize($utf);
$test = ($hex === $expectedReturn);
$hex = '('.$hex.')';
}
else
{
$hex = '';
$test = ($utf === $expectedReturn);
}
if($test)
{
$ret .= 'pass';
if(!$expectedPass)
{
$ret .= "\n\t".var_export($utf,true).$hex.' == '.var_export($expectedReturn,true);
}
}
else
{
$ret .= 'fail';
if($expectedPass)
{ // the run failed, output the returns
$ret .= "\n\t".var_export($utf,true).$hex.' != '.var_export($expectedReturn,true);
}
}
return array($test === $expectedPass,$ret);
}
function testUTF()
{
$results = array();
$results['passed'] = array();
$results['failed'] = array();
$args = array();
$args[] = array(1114112,false );
$args[] = array(1114111,'F48FBFBF'); // 0x0010FFFF
$args[] = array(1048576,'F4808080'); // 0x00100000
$args[] = array(1048575,'F3BFBFBF'); // 0x000FFFFF
$args[] = array(262144, 'F1808080'); // 0x00040000
$args[] = array(262143, 'F0BFBFBF'); // 0x0003FFFF
$args[] = array(65536, 'F0908080'); // 0x00010000
$args[] = array(65535, 'EFBFBF' ); // 0x0000FFFF
$args[] = array(57344, 'EE8080' ); // 0x0000E000
$args[] = array(57343, false ); // 0x0000DFFF these are ill-formed
$args[] = array(56040, false ); // 0x0000DAE8 these are ill-formed
$args[] = array(55296, false ); // 0x0000D800 these are ill-formed
$args[] = array(55295, 'ED9FBF' ); // 0x0000D7FF
$args[] = array(53248, 'ED8080' ); // 0x0000D000
$args[] = array(53247, 'ECBFBF' ); // 0x0000CFFF
$args[] = array(4096, 'E18080' ); // 0x00001000
$args[] = array(4095, 'E0BFBF' ); // 0x00000FFF
$args[] = array(2048, 'E0A080' ); // 0x00000800
$args[] = array(2047, 'DFBF' ); // 0x000007FF
$args[] = array(128, 'C280' ); // 0x00000080
$args[] = array(127, '7F' ); // 0x0000007F
$args[] = array(0, '00' ); // 0x00000000
$args[] = array(20108, 'E4BA8C' ); // 0x00004E8C
$args[] = array(77, '4D' ); // 0x0000004D
$args[] = array(66306, 'F0908C82'); // 0x00010302
$args[] = array(1072, 'D0B0' ); // 0x00000430
foreach($args as $argList)
{
list($pass,$ret) = call_user_func_array('utfTest',$argList);
$results[$pass ? 'passed' : 'failed'][] = $ret;
}
if(count($results['failed']))
{
echo "One or more tests failed:\n";
echo implode("\n",$results['failed']);
}
else
{
echo "All tests passed.\n";
}
}
//testUTF();
echo '<pre>Before:
'.htmlentities(var_export($text,true)).'
</pre>';
$text = preg_replace_callback('/&#([0-9]+?);/','makeUTF8',$text);
echo '<pre>After:
'.htmlentities(var_export($text,true)).'
</pre>';
?>
obviously enough, to run the unit tests, uncomment line 196 (the call to testUTF())
Posted: Sun Aug 07, 2005 11:48 am
by Pyrite
Hmmm, well I don't understand your code, but it doesn't work for me.
If I do this to populate my select (with his code) it works:
Code: Select all
$i = 0;
while (!$step2->EOF) {
$lid = $step2->fields[0];
$lnm = utf8Encode($step2->fields[1]);
?>
document.forms['frmEnroll'].step2.options[<?=$i;?>] = new Option('<?=$lnm;?>','<?=$lid;?>');
<?php
$i++;
$step2->MoveNext();
}
But with yours:
Code: Select all
$i = 0;
while (!$step2->EOF) {
$lid = $step2->fields[0];
$lnm = MakeUTF8($step2->fields[1]);
?>
document.forms['frmEnroll'].step2.options[<?=$i;?>] = new Option('<?=$lnm;?>','<?=$lid;?>');
<?php
$i++;
$step2->MoveNext();
}
All I get is blank lines in the select. Hmmm.
Posted: Sun Aug 07, 2005 2:27 pm
by feyd
makeUTF8() only does translation from a code to UTF8, it doesn't parse a text for you.. that's why I placed the preg_replace_callback() call at the bottom...
