Page 1 of 1

[56k Warn]modify existing news system to UTF-8 compatibility

Posted: Thu Jun 30, 2005 7:38 am
by derkarsten
Jcart | If included images please include [56k Warn] in title.

Hi there,

Some time ago I realized a site with content in English and German language. At that time the existing "newswriter" script (http://www.newswriter.info) seemed to be a suitable solution for new the news system.

Now the website should be upgraded with two more languages: Chinese and Japanese. I needed to convert all contents to utf-8 and send the correct headers and all worked very well - but not the news system (which I haven´t wrote).

If I login to the system and write a new article (by simply coopy & paste some contents from a chinese website directly out of a browser window), everything seems to be fine and looks this way:

Image

I already encoded the admin.php (the main file) and all included template-parts to utf-8 and added the accept-charste="utf-8" parameter to the html-forms. In addition, the admin.php file sends a utf-8 header via php and in the included header.php file is the correct http-equiv specification.

But if I just ">> go on" some more times (which displays some more menus, but all within the above mentioned admin.php file) to finally get to the article-preview, I get this:

Image

But this isn´t utf-8, isn´t it? And I don´t know what it is and why the content is displayed this way...

Does anybody has an idea?
I would be so grateful!

Greetings,
Karsten

Re: modify existing news system to UTF-8 compatibility

Posted: Thu Jun 30, 2005 7:47 am
by Roja
derkarsten wrote: But this isn´t utf-8, isn´t it? And I don´t know what it is and why the content is displayed this way...
Can't really be sure. You've essentially described each step as having the correct items you need, and obviously, somewhere along the way something is going wrong.

If you have links we could look at, perhaps at least an output page, then we might be able to figure more out. Without the code, links to look at, or anything more, it sounds like you've described everything that is needed to get utf-8 right.

Posted: Thu Jun 30, 2005 9:56 am
by onion2k
Are you storing the news in MySQL? Coz you'll need to be using MySQL 4.1, and you'll need to set the tables, and collation, all to store UTF-8. You'll possibly need to tell MySQL to return UTF-8 data too, which is done by calling mysql_query("SET NAMES 'utf8'); before any query that should get something in UTF8 format.

Also note that a lot of PHP commands need to be replaced with their mb_ multibyte equivalent if you're actually doing anything other than just echo'ing the data.

Posted: Thu Jun 30, 2005 12:03 pm
by derkarsten
The script is storing all article informations in simple text files. But I think storing the informations "physically" is the second part of the problem.

At the moment everything mentioned above takes place in only one file named "admin.php". I paste some Chinese signs in a form (with accept-charset="utf-8" option, see first screenshot) and click "go on" four times to get to the article-preview (second screenshot).

Up to this point, any data seems to be stored only in the $_POST[] Array - only when I now click on "Publish this article", the article-data is written to a txt file.

But the point is: what could make the kind of output (second screenshot) out of a well formatted copy&pasted Chinese unicode text directly from my utf-8-charset-accepting html form? Any double encoding? Any php-textwrap-function? I don´t know :(

Do you (or anybody else) have an idea?
Thanks a lot for your time!