Page 1 of 1
Convert string to Unicode
Posted: Mon Jun 29, 2009 1:19 am
by phpnewbieperson
Hey,
What's the easiest way to convert japanese characters to a unicode string to insert into a database? I am having issues with using utf-8 charset across browsers/mysql/php encoding - can't get it to work across the major browsers.
I would just like to convert the text a user enters into a form into unicode and then insert it into the database.
For example, this '確定' is inserted into the database as '確定'
Any guiding help would be great.
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 3:21 am
by requinix
Unicode is a really broad term.
If your webpages and database tables are in the same encoding then you shouldn't have much work to do - if any, it tends to make itself all work out.
If your stuff is in different encodings then it can be a pain.
If you entity-encoded everything (which you seem to be doing right now) then that will work too, but the characters are specific to HTML and related formats. As in, you can't write that stuff to a text file and expect it to appear fine in (eg) Notepad.
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 5:15 pm
by phpnewbieperson
Thanks for the info. I want to store the entity reference only in the database. Also, when a search is performed on indexes, I would need to convert the data to the correct entities first and then search.
I'm using PHP5 if that helps.
I understand it should be simple using UTF-8 for the browser encoding, and setting PHP and MySQL to UTF-8 by default for pretty much everything, but I continue to have issues with little squares, question marks and general gibberish replacing what should be kanji or other Japanese characters! It's doing my head in and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 5:56 pm
by Weirdan
phpnewbieperson wrote:and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
It will work for search, store, display and comparision, but would it work for sorting?
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 6:09 pm
by phpnewbieperson
Weirdan wrote:phpnewbieperson wrote:and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
It will work for search, store, display and comparision, but would it work for sorting?
I don't need to sort on the fields that contain that data. The system it is being used for is somewhat limited and does not require sorting on those fields.
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 6:12 pm
by danielrs1
You are looking for the function
html_entity_decode().
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 6:19 pm
by phpnewbieperson
That looks good. Correct me if I'm wrong, but that's for converting the data stored in the database to the desired character in the browser? What about going from the form into the database in the html entity format?
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 6:21 pm
by danielrs1
html_entity_decode() -> From HTML to db (plain text).
html_entities() -> From db (plain text) to HTML.
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 6:29 pm
by phpnewbieperson
Beauty, I'll give that a go!
Thanks for your help, much appreciated.
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 7:11 pm
by BornForCode
You should also check the collation of your table

Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 7:21 pm
by phpnewbieperson
BornForCode wrote:You should also check the collation of your table

As I said, all that has been done. The database default collation, table collation and column collation are all set to utf8_general_ci.
PHP encoding is set to utf8. The browser has charset=utf8.
For whatever reason, differences exist between IE7, IE8, Firefox3, Safari4 and Chrome, which the system has to work with. I've spent about a week on this issue - I've been through hell and back

and the entity references are the answer

(IMHO)
Thanks for your input.
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 7:28 pm
by BornForCode
If you are wondering why -despite all UTF8 settings- you still don't get non-ASCII characters right, it might be the case that:
1. you have created a Database with character set latin1 (this is the default!) and not with character set utf8.
2. You have created a Table with character set utf8.
And by the way maybe that utf8 is not solving all your problems

here are some collations:
| big5 | Big5 Traditional Chinese |
| cp932 | SJIS for Windows Japanese |
| eucjpms | UJIS for Windows Japanese |
| euckr | EUC-KR Korean |
| gb2312 | GB2312 Simplified Chinese |
| gbk | GBK Simplified Chinese |
| sjis | Shift-JIS Japanese |
| ujis | EUC-JP Japanese
Re: Convert string to Unicode
Posted: Mon Jun 29, 2009 7:35 pm
by phpnewbieperson
Thanks for the info. I'm still of the opinion that entity ref is the way to go as the system needs to handle a combination of english and japanese in many tables and columns and across multiple browsers.
I'll take all your info into account, thanks
