Page 1 of 2
Converting encoded strings
Posted: Mon Sep 01, 2014 9:49 am
by Nunners
We are doing various encoding etc on strings stored in various databases. For application reasons, we have to store them in Unicode format, which for us is @U (then a hex string). These strings, prior to converting to Unicode, are in UCS-2; however when bringing it back to view on a web browser it needs to be in UTF-8 - yes I know it sounds a nightmare way of doing things, but unfortunately it has to be!
So an example:
The Š character is 0160 in Unicode/UCS-2
When pulling that back into the strings, using the following:
Code: Select all
$message = '0160';
$_message = hex2bin($message); //return "`�" that's back tick and a special char
$message = mb_convert_encoding($_message, 'UTF-8', 'UCS-2'); // returns "Å "
Can anyone suggest a better way to convert the string, so it works!? Even if it means producing it in html entities?! (by the way, I have tried using htmlentities, but it doesn't support UCS2!)
Thanks
James
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 9:55 am
by Celauran
If you're using PHP 5.5+ with Intl extension, check out
UConverter::transcode. It's not currently documented, but the parameters are self-explanatory.
Code: Select all
php > $string = hex2bin('0160');
php > echo UConverter::transcode($string, 'UTF-8', 'UCS-2') . "\n";
Š
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 10:10 am
by Nunners
Ah - great idea... however we're stuck on 5.3
Do you know if there's a version of it available for lower PHP version?!
Thank you though...
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 10:16 am
by Celauran
php.net says
(PHP 5.5.0, PECL >= 3.0.0a1)
so maybe try a PECL install?
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 10:25 am
by Nunners
Did a package search, and no results for it in there!? Odd!
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 10:31 am
by Celauran
Maybe also something with your locale as I just used mb_convert_encoding and got the expected output.
Code: Select all
php > echo mb_convert_encoding($string, 'UTF-8', 'UCS-2') . "\n";
Š
What's your environment like?
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 10:39 am
by Nunners
Our environment is Apache2/PHP 5.3.3 on RHEL 6.5
I've tried the mb_convert_encoding, but it come back with the A-ring and a space

Re: Converting encoded strings
Posted: Mon Sep 01, 2014 10:44 am
by Celauran
Found a CentOS 6.5 image. Close enough. I'll spin up a Vagrant box real quick and see if I have any luck.
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 10:48 am
by Nunners
Brilliant - thanks...
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 10:54 am
by Celauran
Hmm.
hex2bin() is only available as of PHP 5.4. What implementation are you using? Could the error lie there?
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 10:58 am
by Nunners
Good point well made! A copy of the function from the php docs...
Code: Select all
if (!function_exists('hex2bin')) {
function hex2bin($hexstr) {
$n = strlen($hexstr);
$sbin = "";
$i = 0;
while ($i < $n) {
$a = substr($hexstr, $i, 2);
$c = pack("H*", $a);
if ($i == 0) {
$sbin = $c;
} else {
$sbin.=$c;
}
$i+=2;
}
return $sbin;
}
}
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 11:01 am
by Celauran
CentOS 6.5 VM
Code: Select all
php > echo phpversion();
5.3.3
php > $string = pack('H*', '0160');
php > echo mb_convert_encoding($string, 'UTF-8', 'UCS-2') . "\n";
Š
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 11:13 am
by Celauran
May not be that implementation after all.
Code: Select all
<?php
function hex2bin($string) {
$n = strlen($string);
$output = '';
$i = 0;
while ($i < $n) {
$a = substr($string, $i, 2);
$c = pack('H*', $a);
if ($i == 0) {
$output = $c;
} else {
$output .= $c;
}
$i += 2;
}
return $output;
}
$input = '0160';
var_dump(mb_convert_encoding(hex2bin($input), 'UTF-8', 'UCS-2'));
yields the expected result.
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 2:48 pm
by Nunners
There must be something strange I'm doing then. Our main platform runs 5.3.3, and local dev server at work 5.3.21, and my dev server at home 5.4.12 - all of them have the same problem.
I've now simplified everything we're doing into one script, and tested on all of the above platforms.
Code: Select all
<?php
if (isset($_POST["message"])) {
$message = $_POST["message"];
$__message = mb_convert_encoding($message, 'UCS-2','auto');
$unicode_message = strtoupper(bin2hex($__message));
echo ("Unicode: ".$unicode_message."<br />");
echo ("Message:");
var_dump(mb_convert_encoding(hex2bin($unicode_message), 'UTF-8', 'UCS-2'));
}
?>
<form accept-charset="utf-8" method="post">
<textarea name="message"><?=$_POST["message"]?></textarea>
<input type="submit" />
</form>
in all cases, "Unicode" comes out as 0160 (which is correct) but "Message" always returns "Å "
This is really baffling me, and I wonder if I'm missing something really obviousy.
BTW - this last test script is literally just the above, so everything else is defined in the php.ini files etc, which are out of the box and haven't been changed (except some file locations etc).
Again, any help gratefully received...
Thanks
Re: Converting encoded strings
Posted: Mon Sep 01, 2014 4:12 pm
by Celauran
Has to be something environment-specific. I have tried on:
OS X 10.9, Apache 2.2.26, PHP 5.5.11
Ubuntu 14.04, nginx 1.6.0, PHP 5.5.15
CentOS 6.5, Apache 2.2.15, PHP 5.3.3
and consistently had the same result, which has always been the expected output.