unicode problem

Whether you are using Linux on the desktop or as a server, it's still good that you're using Linux. Linux related questions go here.

Moderator: General Moderators

Post Reply
nigaki
Forum Newbie
Posts: 2
Joined: Fri Mar 21, 2008 2:01 pm

unicode problem

Post by nigaki »

Dear forum,

I wrote a little script a long time ago that created a vocabulary quiz. It was reading an xml-file where the nodes had a "reference language" and a "target language" element. Both of them could contain non-ascii characters (e.g. polish and german). Since I did not have an http server, I was running

Code: Select all

php quiz.php > output.html
and then opened output.html with my browser.

Everything worked fine.

A few months ago, I changed my OS from windows xp to ubuntu. Without making any changes to the code, using the above command something went wrong. Here is a screenshot with the bad output to the left and the good output (generated back then) to the right (the changed color is normal, done by the javascript, so are the different quenstion numbers since they are random)
Image

This is also what I see when I open the html file with gedit. It seems that every special character is replaced by two (!) other special characters (I guess the missing information is that both bytes should be interpreted as one character).

Replacing the special characters with { is not an option, by the way, since I need the javascript to compare the solution input by the student with the one saved in the file.

I would appreciate any help.

The relevant part of the php script is this:

Code: Select all

$currq = '
    <div class="quizelement" id="e%02d">
        <table>
        <tr><td colspan="2" class="header">Question %d</td></tr>
        <tr><td><img src="flags/%s_small.png"/></td>
            <td><b>%s</b> - <i>%s</i></td></tr>
       <tr><td><img src="flags/%s_small.png"></td>
            <td><input size="54" id="a%d" type="text" onChange="checkSolution(this.value,\'%s\',\'%02d\')"></td></tr>
       <tr><td></td>
           <td align="center">&nbsp;<span id="s%02d" class="solution"><b>Wrong </b><span class="tip">(solution: <i>%s</i>)</span></span>
                                    <span id="c%02d" class="correct">Correct!</span></td></tr>
        </table>
    </div><br/>
    ';
            printf($currq, $j, $j, $myquiz->referencelanguage, $currquestion->referenceword, $currquestion->comment, $myquiz->targetlanguage, $j,
                           $currquestion->targetword, $j, $j, $currquestion->targetword, $j);
Thank you for your time,
Nikos
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: unicode problem

Post by Weirdan »

It seems you need to specify an encoding header in your <meta> tags. In your case, it appears, correct would be:

Code: Select all

 
<meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
 
nigaki
Forum Newbie
Posts: 2
Joined: Fri Mar 21, 2008 2:01 pm

Re: unicode problem

Post by nigaki »

Weirdan wrote:It seems you need to specify an encoding header in your <meta> tags. In your case, it appears, correct would be:

Code: Select all

 
<meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
 
I tried this, but it didn't have any effect. I guess this tag prevents you from having to change the encoding manually in your browser, but in my case, before adding the tag, even if I choose utf8 in firefox the result was the same.

I think the problem is that when the output passes through the standard output, the information "some characters are encoded by 2 bytes" gets lost. I thought I could pipe it into some other program before directing it into the file with "> output.html" but I didn't manage.

Any guesses? Suggestions what to change in the code?

Thank you for taking the time to help,
Nikos
Post Reply