Page 1 of 1

multi language page

Posted: Thu Nov 04, 2010 5:57 am
by donki
I'm new to PHP and currently learning it by myself.
I need to build a site for updating news. However, the site is required to be in multi-language support. This means posts on a page could be in various type of language (including Chinese, Japanese, Korean, ...). As far as I learned, I need to save my database with utf8 character type, but displaying a language need to be encoded in that language, or else it'll be junk characters, and with more than 1 language, it's impossible to encode each tiem just to read a page.
I wonder if I could get some help here.
Thanks in advance.

Re: multi language page

Posted: Thu Nov 04, 2010 7:21 am
by Apollo
There's no such thing as 'encoding in a language'. A language is not (or does not define) a way to encode (binary represent) text.

Regarding utf8, this is a unicode encoding, meaning it can hold text in ANY language, including Japanese, Chinese, Korean, and Klingon. Also multiple languages combined in one page are no problem as long as you stick tot utf8.

Re: multi language page

Posted: Thu Nov 04, 2010 7:34 am
by donki
it's not what i meant.
for example, I have 2 post in a page. 1 is chinese, 1 is japanese.
when i open it with a browser, it does not automatically recognize the the language in the post, so what it displays is junk characters. In order to read it, I need to tell the browser to read the page in a specific language (which is View -> Encoding in IE).
I don't know how to make the browser recognize the language within.

Re: multi language page

Posted: Thu Nov 04, 2010 7:58 am
by Apollo
donki wrote:for example, I have 2 post in a page. 1 is chinese, 1 is japanese.
when i open it with a browser, it does not automatically recognize the the language in the post, so what it displays is junk characters.
This only happens if you encode the text using an encoding suitable for 1 language (for example iso-2022-jp or euc-kr), which means the other language can't even be expressed in that encoding (thus becomes junk if you try it anyway).

If you use unicode (where utf8 is the most obvious encoding choice) this problem does not occur.
In order to read it, I need to tell the browser to read the page in a specific language (which is View -> Encoding in IE).
I don't know how to make the browser recognize the language within.
Make sure to clearly specify the encoding you're using in your HTML header, e.g. use

Code: Select all

<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
somewhere in your HTML's <head> section.

Re: multi language page

Posted: Thu Nov 04, 2010 10:13 am
by donki
I tried to make a sample file with Korean characters like this

Code: Select all

<head>
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
</head>
<body>
<br>
<b>Name: </b>ㄹㅈㄷㄱㅈㄷ<br>
<b>Location: </b>마음을<br>
<b>Email: </b>ㅈㄷㄱㅁㅈㄷㄻ<br>
<b>URL: </b>ㅊ<br>
<b>Comments: </b>ㅁㄹㅈㄷㄹㄴㅇㄹㄴㅇㄹㅈ<br>
<br>
<br>
<h2><a href="sign.php">Sign in my Guestbook</a></h2>
</body>
The page is already encoded in Unicode, but somehow all Korean characters become junk. Unless I forced it to read the page in Korean, these characters would still be junk like that.

Re: multi language page

Posted: Thu Nov 04, 2010 11:29 am
by Apollo
donki wrote:I tried to make a sample file with Korean characters like this
The page is already encoded in Unicode, but somehow all Korean characters become junk. Unless I forced it to read the page in Korean, these characters would still be junk like that.
Your page does indeed contain a header that specifies unicode (utf8), but is the data itself actually utf8-encoded? That is, do you know how your editor, in which you editted that particular .html file, saves its content?
(the fact that the characters are displayed correctly here on this forum page, doesn't say much about that)

To verify, can you rename it to .php and change it into this:

Code: Select all

<?php
$s = "ㄹㅈㄷㄱㅈㄷ";
$s .= " = ".bin2hex($s);
print("<html><head>
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
</head><body>
Name: $s
</body></html>");
?>
Now it should print "Name: <characters> = <hex>", what is the hex output? (if it starts with e384b9e38588...etc then your data is correct, otherwise it's wrong)

Re: multi language page

Posted: Thu Nov 04, 2010 11:45 am
by Zyxist
If you are going to store Chinese, Japanese and Korean texts, I would recommend UTF-16. UTF-8 is very space-consuming for these languages.