php & Unicode 5.1 characters
Moderator: General Moderators
php & Unicode 5.1 characters
Hi all
I need to create function that returns Unicode 5.1 number of character
for example:
if I give character "Đ" to function it should return number "0110"
(please check picture)
can somebody point me in right direction
what functions should I use?
thank you and kind regards
I need to create function that returns Unicode 5.1 number of character
for example:
if I give character "Đ" to function it should return number "0110"
(please check picture)
can somebody point me in right direction
what functions should I use?
thank you and kind regards
- Attachments
-
- ScreenShot015.jpg (129.28 KiB) Viewed 565 times
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: php & Unicode 5.1 characters
Until PHP 6 comes out, PHP is not Unicode aware.
Even then, I don't think we'll have a new "char" type that would hold the Unicode value you're in need of. I'm in the process of writing encoders and decoders for PHP 5 (that are unicode aware). So far I can decode UTF-8 streams into sequences of octets logically grouped by character, and a series of integers representing the unicode values (UCS-4) of those characters.
I just had a quick flick through the Multibyte String functions and can't see anything in there that returns the unicode value of the character but you may find something useful anyway.
Perhaps parsing the output of this:
http://au2.php.net/manual/en/function.m ... entity.php
What character encoding are you working with?
PHP really needs char and byte types
Even then, I don't think we'll have a new "char" type that would hold the Unicode value you're in need of. I'm in the process of writing encoders and decoders for PHP 5 (that are unicode aware). So far I can decode UTF-8 streams into sequences of octets logically grouped by character, and a series of integers representing the unicode values (UCS-4) of those characters.
I just had a quick flick through the Multibyte String functions and can't see anything in there that returns the unicode value of the character but you may find something useful anyway.
Perhaps parsing the output of this:
http://au2.php.net/manual/en/function.m ... entity.php
What character encoding are you working with?
PHP really needs char and byte types
Re: php & Unicode 5.1 characters
Hi Chris and thank you for quick reply
all characters are in utf-8 and database too.
all characters are in utf-8 and database too.
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: php & Unicode 5.1 characters
I can confirm that I am able to decode that character to Hex 0110 using my code.
If you're using UTF-8 I'd be happy to share.
If you're using UTF-8 I'd be happy to share.
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: php & Unicode 5.1 characters
2 seconds... I'll put some code up. It's terribly unfinished, but the stuff you need is there.
Re: php & Unicode 5.1 characters
great
hardly waiting
hardly waiting
Re: php & Unicode 5.1 characters
If you have string representing 'Đ' in utf-8 encoding, it's no problem at all converting that to 0x110. Has nothing to do with your particular PHP version being Unicode complient or not. Just decode the utf-8 by hand. I guess Chris Corbyn is about to post this, but if he has something else in mind, I'll post another solution.
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: php & Unicode 5.1 characters
Sorry, took me a while to strip out the development stuff I'm building and to wrap it with a convenient function.
Here's how you use it:
There's also a version that gets an array of unicode characters from a string:
NOTE: My code works but is very much a half-built development version. I haven't added the support for replacing ill-formed data yet (you'll see it commented out).
Here's how you use it:
Code: Select all
<?php
require_once dirname(__FILE__) . '/../get_ucs4_value.php';
echo dechex(get_ucs4_value('?')); //110Code: Select all
<?php
require_once dirname(__FILE__) . '/../get_ucs4_value.php';
$ucs4 = get_ucs4_values('??? ?? ??????? ???????? ??????????????, ??? ???? ????? ?????? ??.');
foreach ($ucs4 as $value) { //PHP's integers are decimal... we'll present them as hexadecimal
printf("%08X\n", $value);
}
/*
0000041C
0000043E
00000433
00000020
0000043D
00000435
00000020
0000043F
0000043E
0000043C
0000043D
00000438
00000442
0000044C
00000020
0000043D
00000438
0000043A
00000430
0000043A
0000043E
... and so on ...
*/- Attachments
-
- ucs4-handling.zip
- (12.83 KiB) Downloaded 109 times
Re: php & Unicode 5.1 characters
thank you Chris for code example
I'm getting value 400 instead 0110
here is complete code
I'm getting value 400 instead 0110
here is complete code
Code: Select all
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>Untitled 1</title>
</head>
<body>
<?php
require_once 'get_ucs4_value.php';
echo dechex(get_ucs4_value('?')); //110
?>
</body>
</html>
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: php & Unicode 5.1 characters
Something is not UTF-8 in which case. I certainly get 110.
What happens if you change:
?
What happens if you change:
Code: Select all
echo dechex(get_ucs4_value(utf8_encode('?'))); //110Re: php & Unicode 5.1 characters
Already tried this
I get as response
d0
I get as response
d0
Re: php & Unicode 5.1 characters
sorry - my mistake
file was not saved as utf-8
it is working now
thank you for help
file was not saved as utf-8
it is working now
thank you for help
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: php & Unicode 5.1 characters
No problem 
Re: php & Unicode 5.1 characters
Chris, I'm sure your code works fine, but it looks way more complicated than necessary. Here's my version:
Code: Select all
function ExtractUtf8Codes( $s ) // convert utf8 encoded string to array of separate unicode codes
{
$invalid = 0x3f; // code for invalid chars (0x3f = '?')
$codes = array();
for($i=0;;)
{
$c = ord($s[$i++]);
if (!$c) break;
if (!($c & 0x80)) { $codes[] = $c; continue; } // single byte char
$n = 0;
while ($c & (128 >> $n)) $n++;
if ($n<2 || $n>6) { $codes[] = $invalid; continue; } // invalid char (should be 11etc)
$x = $c & ((1 << (8-$n))-1); // get top bits
for(;$n>1;$n--)
{
$c = ord($s[$i]);
if (($c & 0xC0)!=0x80) { $codes[] = $invalid; continue; } // invalid char (subsequent chars should be 10etc)
$x = ($x << 6) | ($c & 0x3F); // append bits
$i++;
}
$codes[] = $x;
}
return $codes;
}
$example = chr(0x61).chr(0xc4).chr(0x90).chr(0xe2).chr(0x82).chr(0xac); // $example contains 'AЀ' in utf8 encoding
$codes = ExtractUtf8Codes( $example );
// $codes is now array(0x61,0x110,0x20AC)- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
Re: php & Unicode 5.1 characters
Yeah, mine's taken from a larger OOP system that need to handle multiple character encodings where the input stream may be from different sources (file, string).
I agree that using yours for this particular problem would be better
I didn't write mine to solve this problem, I just had it lying around from part of a much larger project.
EDIT | Yours will be a lot slower for larger strings BTW due to the repeated ord() usage. Some of the verbosity of mine is because it needs to be fast (it's part of Swift Mailer).
I agree that using yours for this particular problem would be better
EDIT | Yours will be a lot slower for larger strings BTW due to the repeated ord() usage. Some of the verbosity of mine is because it needs to be fast (it's part of Swift Mailer).