Page 1 of 1

Handling Double byte characters for anytype of encoding in b

Posted: Tue Aug 26, 2008 6:51 am
by virendrachandak
Hello All,

We are working on translation project using PHP, mysql and Apache and facing some problems on IE browser while displaying non English and double byte characters.
The interface should be able to save English, non English and double byte characters from Browser. when we save from Mozilla firefox, it converts those characters to the hex code and save it in the database, and while displaying we just convert the hex code of the character and display so characters displays correctly. But when we save non English and double byte characters from IE , they get saved in the database in some binary format (actually they are not saved in the database as their hex code), so these characters are not displayed correctly on any browser ( in IE as well as Mozilla).

As I understand, problem is in the saving in the database in correct format, and I am not sure if there is any method to display the weird characters saved from IE browser.

Please suggest a solution if anybody came across or worked on this type of issue.

Please Note: We cannot keep the character encoding fixed to UTF-8 in the browser. This encoding can be anything set up by the user.

Example

Input Language Saved in database in this format Browser Name
купите бездисковый лицензионный пакет Russian купите бездисковый лицензионный пакет Mozilla firefox
купите бездисковый лицензионный пакет Russian êóïèòå áåçäèñêîâûé ëèöåíçèîííûé ïàêåò IE

Re: Handling Double byte characters for anytype of encoding in b

Posted: Tue Aug 26, 2008 9:07 am
by dml
That's interesting, when firefox sends a form submission with characters that aren't in the page character set, it returns a html unicode escape. For example if I paste the Cyrillic ? into the form below, it submits it as the seven characters к . You might test this in IE, because I bet it's doing something different: it looks like it's submitting it encoded as 0xea in cp1251, and showing up as ê for you because that 0xea is assumed to be cp1252.

Given that you can't keep the page encoding fixed, I can think of a couple of things, neither of which I've tried in production so I don't know the practicalities of getting them to work. The first thing is to check the request headers for an indication of the encoding - there's nothing promising showing up in the Firefox headers, but maybe IE is helpful enough to indicate what encoding it's sending the data back in. Another trick I've heard of but not tried is to have a hidden field that acts as an encoding fingerprint - for example if you had that Cyrillic ? in that field and you got it back as 0xea you'd infer cp1251, if you got 0xd0ba you'd infer utf8, etc.

Code: Select all

 
<?php
header("Content-Type: text/html; charset=iso-8859-15");
?>
<html>
<?php
var_dump($_POST["t"]);
echo " (", htmlentities($_POST["t"], ENT_QUOTES, "iso-8859-15"), ")";
?>
<br/>
<form method="POST">
<input type="text" name="t"/>
</form>
</html>
 

Re: Handling Double byte characters for anytype of encoding in b

Posted: Tue Aug 26, 2008 6:44 pm
by Weirdan
Couldn't you just switch to utf-8? From my own experience it greatly simplifies supporting mixed languages websites (Cyrillic included).

Re: Handling Double byte characters for anytype of encoding in b

Posted: Tue Aug 26, 2008 11:32 pm
by virendrachandak
Yes the utf-8 encoding simplifies the task, but the requirement is such that we cannot keep the encoding fixed. The encoding can be anything.