Page 1 of 1

Unicode character test cases

Posted: Wed Dec 03, 2008 9:22 pm
by farhan00
I am trying to run a test-case against a unicode character. For example:

if (mb_substr($text,1,0,"UTF-8") == [INSERT FOREIGN CHARACTER HERE] )

The problem is, I'm having trouble with the [INSERT FOREIGN CHARACTER HERE] part. Lets suppose I want the Arabic character ب, which according to Unicode 0628. Do I do "\u0628"? Do I do "\x0628"?

I tried: echo "\x0628"; but it atually echoed \x0628 instead of the Arabic character.

Any ideas on who a successful test-case with a unicode character would work?

Re: Unicode character test cases

Posted: Fri Dec 05, 2008 7:32 am
by dml
As far as I know, there's currently no unicode string literal in PHP, so you'll have to use the utf8 representation of the codepoint. There might be a way of doing it directly in the mb_ functions, but if there isn't, there's a library at http://hsivonen.iki.fi/php-utf8/ that you can use to convert numeric codepoints into utf8 strings.

Code: Select all

 
require 'php-utf8/utf8.inc';
$unicode_string = array(0x0628);
$php_string = unicodeToUtf8($unicode_string);
 
// should print out ? d8a8 
echo $php_string, "\t", bin2hex($php_string), "\n";