Convert string to Unicode

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
phpnewbieperson
Forum Newbie
Posts: 22
Joined: Wed Mar 26, 2008 8:25 am

Convert string to Unicode

Post by phpnewbieperson »

Hey,

What's the easiest way to convert japanese characters to a unicode string to insert into a database? I am having issues with using utf-8 charset across browsers/mysql/php encoding - can't get it to work across the major browsers.

I would just like to convert the text a user enters into a form into unicode and then insert it into the database.

For example, this '確定' is inserted into the database as '確定'

Any guiding help would be great.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Convert string to Unicode

Post by requinix »

Unicode is a really broad term.

If your webpages and database tables are in the same encoding then you shouldn't have much work to do - if any, it tends to make itself all work out.
If your stuff is in different encodings then it can be a pain.

If you entity-encoded everything (which you seem to be doing right now) then that will work too, but the characters are specific to HTML and related formats. As in, you can't write that stuff to a text file and expect it to appear fine in (eg) Notepad.
phpnewbieperson
Forum Newbie
Posts: 22
Joined: Wed Mar 26, 2008 8:25 am

Re: Convert string to Unicode

Post by phpnewbieperson »

Thanks for the info. I want to store the entity reference only in the database. Also, when a search is performed on indexes, I would need to convert the data to the correct entities first and then search.

I'm using PHP5 if that helps.

I understand it should be simple using UTF-8 for the browser encoding, and setting PHP and MySQL to UTF-8 by default for pretty much everything, but I continue to have issues with little squares, question marks and general gibberish replacing what should be kanji or other Japanese characters! It's doing my head in and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Convert string to Unicode

Post by Weirdan »

phpnewbieperson wrote:and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
It will work for search, store, display and comparision, but would it work for sorting?
phpnewbieperson
Forum Newbie
Posts: 22
Joined: Wed Mar 26, 2008 8:25 am

Re: Convert string to Unicode

Post by phpnewbieperson »

Weirdan wrote:
phpnewbieperson wrote:and the entity reference seems to work for pretty much everything so it's a good solution in my opinion.
It will work for search, store, display and comparision, but would it work for sorting?
I don't need to sort on the fields that contain that data. The system it is being used for is somewhat limited and does not require sorting on those fields.
danielrs1
Forum Commoner
Posts: 29
Joined: Wed Jun 24, 2009 5:30 pm

Re: Convert string to Unicode

Post by danielrs1 »

You are looking for the function html_entity_decode().
phpnewbieperson
Forum Newbie
Posts: 22
Joined: Wed Mar 26, 2008 8:25 am

Re: Convert string to Unicode

Post by phpnewbieperson »

danielrs1 wrote:You are looking for the function html_entity_decode().
That looks good. Correct me if I'm wrong, but that's for converting the data stored in the database to the desired character in the browser? What about going from the form into the database in the html entity format?
danielrs1
Forum Commoner
Posts: 29
Joined: Wed Jun 24, 2009 5:30 pm

Re: Convert string to Unicode

Post by danielrs1 »

html_entity_decode() -> From HTML to db (plain text).
html_entities() -> From db (plain text) to HTML.
phpnewbieperson
Forum Newbie
Posts: 22
Joined: Wed Mar 26, 2008 8:25 am

Re: Convert string to Unicode

Post by phpnewbieperson »

danielrs1 wrote:html_entity_decode() -> From HTML to db (plain text).
html_entities() -> From db (plain text) to HTML.
Beauty, I'll give that a go!

Thanks for your help, much appreciated.
BornForCode
Forum Contributor
Posts: 147
Joined: Mon Feb 11, 2008 1:56 am

Re: Convert string to Unicode

Post by BornForCode »

You should also check the collation of your table :)
phpnewbieperson
Forum Newbie
Posts: 22
Joined: Wed Mar 26, 2008 8:25 am

Re: Convert string to Unicode

Post by phpnewbieperson »

BornForCode wrote:You should also check the collation of your table :)
As I said, all that has been done. The database default collation, table collation and column collation are all set to utf8_general_ci.

PHP encoding is set to utf8. The browser has charset=utf8.

For whatever reason, differences exist between IE7, IE8, Firefox3, Safari4 and Chrome, which the system has to work with. I've spent about a week on this issue - I've been through hell and back :) and the entity references are the answer :) (IMHO)

Thanks for your input.
BornForCode
Forum Contributor
Posts: 147
Joined: Mon Feb 11, 2008 1:56 am

Re: Convert string to Unicode

Post by BornForCode »

If you are wondering why -despite all UTF8 settings- you still don't get non-ASCII characters right, it might be the case that:

1. you have created a Database with character set latin1 (this is the default!) and not with character set utf8.
2. You have created a Table with character set utf8.

And by the way maybe that utf8 is not solving all your problems :) here are some collations:

| big5 | Big5 Traditional Chinese |
| cp932 | SJIS for Windows Japanese |
| eucjpms | UJIS for Windows Japanese |
| euckr | EUC-KR Korean |
| gb2312 | GB2312 Simplified Chinese |
| gbk | GBK Simplified Chinese |
| sjis | Shift-JIS Japanese |
| ujis | EUC-JP Japanese
phpnewbieperson
Forum Newbie
Posts: 22
Joined: Wed Mar 26, 2008 8:25 am

Re: Convert string to Unicode

Post by phpnewbieperson »

Thanks for the info. I'm still of the opinion that entity ref is the way to go as the system needs to handle a combination of english and japanese in many tables and columns and across multiple browsers.

I'll take all your info into account, thanks :)
Post Reply