Convert string to Unicode
Moderator: General Moderators
-
phpnewbieperson
- Forum Newbie
- Posts: 22
- Joined: Wed Mar 26, 2008 8:25 am
Convert string to Unicode
Hey,
What's the easiest way to convert japanese characters to a unicode string to insert into a database? I am having issues with using utf-8 charset across browsers/mysql/php encoding - can't get it to work across the major browsers.
I would just like to convert the text a user enters into a form into unicode and then insert it into the database.
For example, this '確定' is inserted into the database as '確定'
Any guiding help would be great.
What's the easiest way to convert japanese characters to a unicode string to insert into a database? I am having issues with using utf-8 charset across browsers/mysql/php encoding - can't get it to work across the major browsers.
I would just like to convert the text a user enters into a form into unicode and then insert it into the database.
For example, this '確定' is inserted into the database as '確定'
Any guiding help would be great.
Re: Convert string to Unicode
Unicode is a really broad term.
If your webpages and database tables are in the same encoding then you shouldn't have much work to do - if any, it tends to make itself all work out.
If your stuff is in different encodings then it can be a pain.
If you entity-encoded everything (which you seem to be doing right now) then that will work too, but the characters are specific to HTML and related formats. As in, you can't write that stuff to a text file and expect it to appear fine in (eg) Notepad.
If your webpages and database tables are in the same encoding then you shouldn't have much work to do - if any, it tends to make itself all work out.
If your stuff is in different encodings then it can be a pain.
If you entity-encoded everything (which you seem to be doing right now) then that will work too, but the characters are specific to HTML and related formats. As in, you can't write that stuff to a text file and expect it to appear fine in (eg) Notepad.
-
phpnewbieperson
- Forum Newbie
- Posts: 22
- Joined: Wed Mar 26, 2008 8:25 am
Re: Convert string to Unicode
Thanks for the info. I want to store the entity reference only in the database. Also, when a search is performed on indexes, I would need to convert the data to the correct entities first and then search.
I'm using PHP5 if that helps.
I understand it should be simple using UTF-8 for the browser encoding, and setting PHP and MySQL to UTF-8 by default for pretty much everything, but I continue to have issues with little squares, question marks and general gibberish replacing what should be kanji or other Japanese characters! It's doing my head in and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
I'm using PHP5 if that helps.
I understand it should be simple using UTF-8 for the browser encoding, and setting PHP and MySQL to UTF-8 by default for pretty much everything, but I continue to have issues with little squares, question marks and general gibberish replacing what should be kanji or other Japanese characters! It's doing my head in and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
Re: Convert string to Unicode
It will work for search, store, display and comparision, but would it work for sorting?phpnewbieperson wrote:and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
-
phpnewbieperson
- Forum Newbie
- Posts: 22
- Joined: Wed Mar 26, 2008 8:25 am
Re: Convert string to Unicode
I don't need to sort on the fields that contain that data. The system it is being used for is somewhat limited and does not require sorting on those fields.Weirdan wrote:It will work for search, store, display and comparision, but would it work for sorting?phpnewbieperson wrote:and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
Re: Convert string to Unicode
You are looking for the function html_entity_decode().
-
phpnewbieperson
- Forum Newbie
- Posts: 22
- Joined: Wed Mar 26, 2008 8:25 am
Re: Convert string to Unicode
That looks good. Correct me if I'm wrong, but that's for converting the data stored in the database to the desired character in the browser? What about going from the form into the database in the html entity format?danielrs1 wrote:You are looking for the function html_entity_decode().
Re: Convert string to Unicode
html_entity_decode() -> From HTML to db (plain text).
html_entities() -> From db (plain text) to HTML.
html_entities() -> From db (plain text) to HTML.
-
phpnewbieperson
- Forum Newbie
- Posts: 22
- Joined: Wed Mar 26, 2008 8:25 am
Re: Convert string to Unicode
Beauty, I'll give that a go!danielrs1 wrote:html_entity_decode() -> From HTML to db (plain text).
html_entities() -> From db (plain text) to HTML.
Thanks for your help, much appreciated.
-
BornForCode
- Forum Contributor
- Posts: 147
- Joined: Mon Feb 11, 2008 1:56 am
Re: Convert string to Unicode
You should also check the collation of your table 
-
phpnewbieperson
- Forum Newbie
- Posts: 22
- Joined: Wed Mar 26, 2008 8:25 am
Re: Convert string to Unicode
As I said, all that has been done. The database default collation, table collation and column collation are all set to utf8_general_ci.BornForCode wrote:You should also check the collation of your table
PHP encoding is set to utf8. The browser has charset=utf8.
For whatever reason, differences exist between IE7, IE8, Firefox3, Safari4 and Chrome, which the system has to work with. I've spent about a week on this issue - I've been through hell and back
Thanks for your input.
-
BornForCode
- Forum Contributor
- Posts: 147
- Joined: Mon Feb 11, 2008 1:56 am
Re: Convert string to Unicode
If you are wondering why -despite all UTF8 settings- you still don't get non-ASCII characters right, it might be the case that:
1. you have created a Database with character set latin1 (this is the default!) and not with character set utf8.
2. You have created a Table with character set utf8.
And by the way maybe that utf8 is not solving all your problems
here are some collations:
| big5 | Big5 Traditional Chinese |
| cp932 | SJIS for Windows Japanese |
| eucjpms | UJIS for Windows Japanese |
| euckr | EUC-KR Korean |
| gb2312 | GB2312 Simplified Chinese |
| gbk | GBK Simplified Chinese |
| sjis | Shift-JIS Japanese |
| ujis | EUC-JP Japanese
1. you have created a Database with character set latin1 (this is the default!) and not with character set utf8.
2. You have created a Table with character set utf8.
And by the way maybe that utf8 is not solving all your problems
| big5 | Big5 Traditional Chinese |
| cp932 | SJIS for Windows Japanese |
| eucjpms | UJIS for Windows Japanese |
| euckr | EUC-KR Korean |
| gb2312 | GB2312 Simplified Chinese |
| gbk | GBK Simplified Chinese |
| sjis | Shift-JIS Japanese |
| ujis | EUC-JP Japanese
-
phpnewbieperson
- Forum Newbie
- Posts: 22
- Joined: Wed Mar 26, 2008 8:25 am
Re: Convert string to Unicode
Thanks for the info. I'm still of the opinion that entity ref is the way to go as the system needs to handle a combination of english and japanese in many tables and columns and across multiple browsers.
I'll take all your info into account, thanks
I'll take all your info into account, thanks