Unicode conversion

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
pdo100
Forum Newbie
Posts: 2
Joined: Thu Apr 09, 2009 6:20 am

Unicode conversion

Post by pdo100 »

Hi,
I am new on this forum, and I can see there were plenty of post regarding problem with unicode and so on.
Before I posted this thread I have tried to find the right problem on this forum that would lead me to resolve my issue.
Unfortunately I didn't find so I write.

I am creating the site that user may post a text containing mix of different characters (i.e japanese). The pages I have done must be in UTF-8 encoding. Every comment sent by user is registered in db. The problem start when saving text in db. I want the oriental language to be saved in unicode (starting with &#...) but latin characters in its ASCII form (note: it must be registered in one field). When I change the page encoding onto ISO-8859-1 everything looks fine. In database my text is as I require. Latin in ASCII and oriental text in unicode. But when I change the page on UTF-8 the than everything is saved in latin + extended latin.
I have created function in php converting all characters to be unicode. But then I have a problem because latin text (say english) is saved in unicode as well.
On my pages I have applied expression (in php) that highlights the oriental words. So when expression meets certain characters (that create unicode - &#...;) it changes its color. So this is why I need to make unicode for only oriental words in the text. In ISO-8859-1 encoding I can do it, but I cant use it. Is there a way of doing it in UTF-8 encoded pages? Did anyone have above issue before and can share experience to get this sorted?
Please let me know I will be very appreciated.
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Unicode conversion

Post by Apollo »

It seems you're confused about some concepts. Are you sure you fully understand the difference between Unicode, utf-8, and &#xxx; html codes?

As for a solution, very briefly: if you consistently use utf-8 everywhere and do it correct, everything will work fine - html pages, database, and user input.
pdo100
Forum Newbie
Posts: 2
Joined: Thu Apr 09, 2009 6:20 am

Re: Unicode conversion

Post by pdo100 »

Apollo wrote:It seems you're confused about some concepts. Are you sure you fully understand the difference between Unicode, utf-8, and &#xxx; html codes?

As for a solution, very briefly: if you consistently use utf-8 everywhere and do it correct, everything will work fine - html pages, database, and user input.
Thanks for reply...
I do understand the difference between UTF and Unicode. I have been doing websites with use of different languages and never had issue with display. As I said in my post - its not issue with display but saving in db. Weather it is saved in db 'ひらがな' or 'ひらがな' the right characters are displayed anyway. I can control it. However if I use expressions in php to find and mark japanese characters using script it becomes problem if I use UTF-8 encoding to register in database because it saves japanese characters in extended latin (ひらがな). I cannot compose expression in this case. It would be very difficult. So I need all japanese characters to be saved in Unicode standard.
As far as I know (and please correct me if I am wrong) saving right text standard in db depends from the page encoding that sends query to db.
Here you have screenshot.
Image
Post Reply