Handling Double byte characters for anytype of encoding in b

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
virendrachandak
Forum Newbie
Posts: 2
Joined: Tue Aug 26, 2008 6:45 am

Handling Double byte characters for anytype of encoding in b

Post by virendrachandak »

Hello All,

We are working on translation project using PHP, mysql and Apache and facing some problems on IE browser while displaying non English and double byte characters.
The interface should be able to save English, non English and double byte characters from Browser. when we save from Mozilla firefox, it converts those characters to the hex code and save it in the database, and while displaying we just convert the hex code of the character and display so characters displays correctly. But when we save non English and double byte characters from IE , they get saved in the database in some binary format (actually they are not saved in the database as their hex code), so these characters are not displayed correctly on any browser ( in IE as well as Mozilla).

As I understand, problem is in the saving in the database in correct format, and I am not sure if there is any method to display the weird characters saved from IE browser.

Please suggest a solution if anybody came across or worked on this type of issue.

Please Note: We cannot keep the character encoding fixed to UTF-8 in the browser. This encoding can be anything set up by the user.

Example

Input Language Saved in database in this format Browser Name
купите бездисковый лицензионный пакет Russian купите бездисковый лицензионный пакет Mozilla firefox
купите бездисковый лицензионный пакет Russian êóïèòå áåçäèñêîâûé ëèöåíçèîííûé ïàêåò IE
dml
Forum Contributor
Posts: 133
Joined: Sat Jan 26, 2008 2:20 pm

Re: Handling Double byte characters for anytype of encoding in b

Post by dml »

That's interesting, when firefox sends a form submission with characters that aren't in the page character set, it returns a html unicode escape. For example if I paste the Cyrillic ? into the form below, it submits it as the seven characters к . You might test this in IE, because I bet it's doing something different: it looks like it's submitting it encoded as 0xea in cp1251, and showing up as ê for you because that 0xea is assumed to be cp1252.

Given that you can't keep the page encoding fixed, I can think of a couple of things, neither of which I've tried in production so I don't know the practicalities of getting them to work. The first thing is to check the request headers for an indication of the encoding - there's nothing promising showing up in the Firefox headers, but maybe IE is helpful enough to indicate what encoding it's sending the data back in. Another trick I've heard of but not tried is to have a hidden field that acts as an encoding fingerprint - for example if you had that Cyrillic ? in that field and you got it back as 0xea you'd infer cp1251, if you got 0xd0ba you'd infer utf8, etc.

Code: Select all

 
<?php
header("Content-Type: text/html; charset=iso-8859-15");
?>
<html>
<?php
var_dump($_POST["t"]);
echo " (", htmlentities($_POST["t"], ENT_QUOTES, "iso-8859-15"), ")";
?>
<br/>
<form method="POST">
<input type="text" name="t"/>
</form>
</html>
 
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Handling Double byte characters for anytype of encoding in b

Post by Weirdan »

Couldn't you just switch to utf-8? From my own experience it greatly simplifies supporting mixed languages websites (Cyrillic included).
virendrachandak
Forum Newbie
Posts: 2
Joined: Tue Aug 26, 2008 6:45 am

Re: Handling Double byte characters for anytype of encoding in b

Post by virendrachandak »

Yes the utf-8 encoding simplifies the task, but the requirement is such that we cannot keep the encoding fixed. The encoding can be anything.
Post Reply