[SOLVED] form data encoding problem...

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
newmember
Forum Contributor
Posts: 252
Joined: Fri Apr 02, 2004 12:36 pm

form data encoding problem...

Post by newmember »

my system:winxp+apache1.3+php4.3.5+IE+firefox

i'm recieving form data which might contain characters from different languages and save it to file...
recently i ran into problem...

my form has: accept-charset="utf-8"

i'll take for example two completely different character sets:
hebrew and russian...

first case: i write in form ONLY hebrew
-when i check the file, i see that these characters became russian characters...(not good :? )

second case: i write in form hebrew and russian
-when i check the file i see both russian and hebrew characters(that is everything as it should be)

i did the same test with firefox and the file in both cases looks like it should be... there is hebrew along russian

(it is probably related to how IE encodes form data before it sends it to server...
if that's the cause.. i don't know how to solve this :? )

can anyone please help me on this?
thanks
User avatar
newmember
Forum Contributor
Posts: 252
Joined: Fri Apr 02, 2004 12:36 pm

Post by newmember »

i still don't know how to solve this... :?
here is what i know for now:

i did additional checks under same conditions:

i enter exactly the same text(in hebrew)...
* if use ie to submit form then file size is 25 bytes which is exactly one byte for character.. so it is not saved as utf-8.
* if use firefox to submit form then file size is 36 bytes and when i open it i see hebrew text...

so i think maybe when ie sees only hebrew and english text in form, it encodes it as ISO-8859-hebrew_charset...but firefox encodes the data always as utf-8...

i tried to run utf8_encode() on the input,thinking maybe php holds string in ISO-8859-hebrew_charset encoding, but then thing go wrong completely...
actualy utf8_encode() can handle only ISO-8859-1

and another test...
i went to php.net manual and looked in comments that people write.i found there a function seems_utf8() which checks if a string in UTF_8 or not...
so i ran this function on input from form that ie sends...the results were:
* if i write hebrew and russian then seems_utf8() returns true...so that means ie sent utf-8 encoded data.
* if i write ONLY hebrew then seems_utf8() return false, meaning that the data arrived as not UTF_8 but in some other encoding.

while firefox ALWAYS sends utf-8 encoded data...

so i really lost here... :?

it looks like php script depend on browser's mercy...!!!

also, i talked on php channel in mirc and someone there said that it's practicly impossible to make multilingual pages with php...but i'm not asking much...i need only utf-8 support

meanwhile i thougth about two solutions:
* first is to force browser to return data as utf-8...but i'm not sure it is really possible...

* and second, is to put hidden input element with hebrew and russian characters inside form(but with this approach i will have to enter character for each language).
i didn't tested the second solution but i think it will work almost for sure...

but this is all workarounds...

so maybe someone encoutered similar difficalties and knows how to overcome this problem?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

maybe if you were in a different character set, it'd use unicode entities... I know whenever I paste characters outside my character set, I get unicode entities..
User avatar
newmember
Forum Contributor
Posts: 252
Joined: Fri Apr 02, 2004 12:36 pm

Post by newmember »

php htmlentities() doesn't help either, (it was the first function i tried)
htmlentities() doesn't support hebrew codepage as you can check in manual.
and as i described in earlier posts ie sends data encoded in hebrew codepage from the start.
(i even printed the translation table with get_html_translation_table() just to see what is in there...and no traceof hebrew ofcource)
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

MSDN wrote: Syntax

HTML <FORM ACCEPTCHARSET = sChar... >
Scripting FORM.acceptCharset(v) [ = sChar ]

Possible Values

sChar
String that specifies or receives a space- and/or comma-delimited list of charset values.

UTF-8
If the user enters characters that are not in the character set of the document containing the form, the UTF-8 character set will be used. UTF-8 is the preferred format for multilingual text.

Remarks

If the this attribute is not specified, the form will be submitted in the character encoding specified for the document. If the form includes characters outside the character set specified for the document, Microsoft Internet Explorer will attempt to determine an appropriate character set. If an appropriate character set cannot be determined, then the characters outside of the character set will be encoded as an HTML numeric character reference. For more information on character sets and numerical character references, see HTML Character Sets.
try setting encoding of the document to UTF-8 (eg via <meta http-equiv="Content-type" content="text/html; charset=UTF-8" />)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

heh, I guessed right :P
User avatar
newmember
Forum Contributor
Posts: 252
Joined: Fri Apr 02, 2004 12:36 pm

Post by newmember »

i will check right now
the fact is that i don't specify character encoding for document, i thought to leave these details to the end
i really hope it is a real solution:)
User avatar
newmember
Forum Contributor
Posts: 252
Joined: Fri Apr 02, 2004 12:36 pm

Post by newmember »

:D
this is works n :D w
so simple and basicly my fault...at least now i know why setting language for document is important...

thank you very m :D ch
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

you're welcome.

btw, msdn.microsoft.com is great site where many IE quirks are documented. You should consider bookmarking it if you seriously in developing for IE.
User avatar
newmember
Forum Contributor
Posts: 252
Joined: Fri Apr 02, 2004 12:36 pm

Post by newmember »

i have this for quite some time... :D

Image
Post Reply