Problem with Greek characters in PHP

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

timski72
Forum Newbie
Posts: 15
Joined: Sun Jan 13, 2008 6:19 am

Problem with Greek characters in PHP

Post by timski72 »

I want to write a script that will transcribe a Greek word into the Latin alphabet, so users who can't read the Greek alphabet will have some idea of how the word is pronounced.

For example, a user might enter the Greek word "Καλημέρα" into a form, click on the "Transcribe" button and it will return "kalee mera".

Putting it rather too simply, but so you understand what I'm trying to achieve, most sounds of the Greek alphabet map to a sound of the Latin alphabet. E.g. "β" sounds like "v" so I map accordingly. Π sounds like "p", "ξ" sounds like "kse" etc.

I did this before in C# express, but when I tried to reproduce this script in PHP I came across a problem. I noticed that I was getting gibberish from the word entered in the form, i.e. in the $_POST['greek']. Php obviously doesn't like unicode. I had a look on the web and there is a lot of discussion about PHP not fully supporting PHP until version 6. Being a novice, I found some of the discussion a bit overwhelming, so before embarking on what might be an impossible mission, I thought I'd ask some advice first.


So, first and foremost, is it possible, or am I wasting my time? Second, is anyone able to give me any pointers?

My current host has PHP 5.25 running.

I have tried adding the following to the html as one post did suggest this might solve the problem, at least I think that's what they were suggesting ;-)
<meta http-equiv=”Content-Language” content=”bn”>
<meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″ >

Any advice would be much appreciated :-)
Cheers,
Tim.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Problem with Greek characters in PHP

Post by Chris Corbyn »

Unless you're doing things like reading substring's of Unicode then PHP will handle it fine. It's more likely you've got a charset mismatch somewhere between your "charset" in your web page and the charset on the tables in the database )or the files you're editting in your text editor).
User avatar
jimthunderbird
Forum Contributor
Posts: 147
Joined: Tue Jul 04, 2006 3:59 am
Location: San Francisco, CA

Re: Problem with Greek characters in PHP

Post by jimthunderbird »

Interesting post, I quickly code this for fun on my website.

http://www.pamground.com/main/work/gree ... /index.php

What this will do is:

You input a characters, be it Greek, Chinese or Spanish, whatever, it will tell you the corresponding html entity code for it.

For example:

? is &#946
? is &#960
? is &#951
...

By converting the word to html entity code, you can then create a map from html entity code to whatever output characters you want, for example:

? looks like B, ? looks like N (whatever)

"&#946"=>"B",
"&#960"=>"N"

I think this solved your problem already.

Also, here's the code I used, I give credit to: http://us3.php.net/manual/en/function.u ... .php#75941

Note: the code below is just for demo and not a fully application up to your requirement

Code: Select all

 
<?php
 
  /////// Logic Layer ////////
  
  
    /**
     * simple UTF-8 to HTML conversion:
     */         
    function utf8_to_html($data){
      return preg_replace("/([\\xC0-\\xF7]{1,1}[\\x80-\\xBF]+)/e", '_utf8_to_html("\\1")', $data);
    }
 
    function _utf8_to_html($data){
      $ret = 0;
      foreach((str_split(strrev(chr((ord($data{0}) % 252 % 248 % 240 % 224 % 192) + 128) . substr($data, 1)))) as $k => $v)
          $ret += (ord($v) % 128) * pow(64, $k);
      return "&#".$ret;
    }
  
  
    $op = trim($_POST['op']);
    if($op == 'start_transcribe'){
      $word = trim($_POST['word']);
      
      $html_code = utf8_to_html($word);
      
      $filtered_html_code = str_replace("&","&",$html_code);
    }
?>
 
 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <title>Transcribe Greek Chracters</title>
</head>
<body>
  <div>
    You enter <?=$html_code?> and the corresponding unicode is <?=$filtered_html_code?> 
  </div>
 
  <p>Input your word and see what's the corresponding html entity code.</p>
  <form method="post" action="index.php">
    <input type="text" name="word"/>
    <input type="submit" value="Transcribe" />
    <input type="hidden" name="op" value="start_transcribe"/>
  </form>
</body>
</html>
 
 
 
timski72
Forum Newbie
Posts: 15
Joined: Sun Jan 13, 2008 6:19 am

Re: Problem with Greek characters in PHP

Post by timski72 »

Thanks Guys,

jimthunderbird your suggestion looks interesting and merits some investigation, however, at this stage I think that is a little bit ahead of myself. I think my initial concern is to check that I have got everything setup properly as Chris suggested in his post.

So before I go on to look at your suggestion, I want to be sure that the value I am getting from the form is correct. As a little test, I did the following to see what values I would get from a form.

I created this simple form:

Code: Select all

 
<html>
<meta http-equiv=”Content-Language” content=”bn”>
<meta http-equiv=”Content-Type” content=”text/html; charset=utf-8? >
<head>Transliterator Beta 1</head>
<body>
<form method="post" action="transliterator.php" >
Enter the word you wish to transliterate: <input type="text" name="greek" size="30">
<input type="submit" value="send">
</form>
</body>
</html>
 
And this simple script:

Code: Select all

 
<?php
 
$input = $_POST['greek'];
 
if ($input == "?????")
{
    echo "you said: <i>$input</i>";
}
else
{
    echo "you said something else";
}
echo  "<br> $input</br>" ;
?>
 
I then type in "?????" in the form and press submit but it returns the following:
you said something else
?????
As $input correctly contains "?????" - it has been echoed correctly - I can't understand why the if ($input == "?????") doesn't seem to be working. Instead of executing the if, as I would expect, it's going into the else?

Thanks,
Tim.
User avatar
jimthunderbird
Forum Contributor
Posts: 147
Joined: Tue Jul 04, 2006 3:59 am
Location: San Francisco, CA

Re: Problem with Greek characters in PHP

Post by jimthunderbird »

I encountered your problem at 2004 when dealing with some traditional Chinese characters, I used almost exact same if statement technique to compare the word, but it simply won't work, I then find out I'm using GB2312 and my friend who enter the characters use BIG5 charset. I guess maybe it's the encoding header that is causing this problem.

I don't know about Greek's charset, but hoepefully this gave you some hints.

My solution in unicode I think is kind of a "portable" way to doing it, although maybe a little big "heavy".

Best Regards,
Jim
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Problem with Greek characters in PHP

Post by Weirdan »

timski72 wrote: As $input correctly contains "?????" - it has been echoed correctly - I can't understand why the if ($input == "?????") doesn't seem to be working. Instead of executing the if, as I would expect, it's going into the else?
The reason could be that your php file is saved in cp1253 and your data is coming in utf8. Here's a simple way to check:

Code: Select all

 
echo 'Data from post: ' . bin2hex($_POST['greek']);
echo 'Data in file: ' . bin2hex("?????");
 
If strings are the same you've got everything right. Otherwise there's a character set mismatch.
User avatar
jimthunderbird
Forum Contributor
Posts: 147
Joined: Tue Jul 04, 2006 3:59 am
Location: San Francisco, CA

Re: Problem with Greek characters in PHP

Post by jimthunderbird »

hi weirdan, I agree with your approach, and your are the php guru I had to admit
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Problem with Greek characters in PHP

Post by Chris Corbyn »

From the code you posted:

Code: Select all

<meta http-equiv=”Content-Type” content=”text/html; charset=utf-8? >
Are those proper double quotes? They don't look like they're from the us-ascii character set.

Also, you have a ? after utf-8 instead of a closing double quote.

What text editor are you using to edit your code?
timski72
Forum Newbie
Posts: 15
Joined: Sun Jan 13, 2008 6:19 am

Re: Problem with Greek characters in PHP

Post by timski72 »

Ah lots of food for thought :-) Someone also sent me this link http://webcollab.sourceforge.net/unicode.html which re-iterates what some of you have said. I will look into all of your points and update the post if I am successful. Thanks for your help so far!
Tim.
timski72
Forum Newbie
Posts: 15
Joined: Sun Jan 13, 2008 6:19 am

Re: Problem with Greek characters in PHP

Post by timski72 »

OK, following your suggestions I've done and found the following.

1) I corrected the error in the meta tag of my form so the form html now reads:

Code: Select all

 
<html>
<meta http-equiv="Content-Language" content="bn">
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<head>Transliterator Beta 1</head>
<body>
<form method="post" action="transliterator.php" >
Enter the word you wish to transliterate: <input type="text" name="greek" size="30">
<input type="submit" value="send">
</form>
</body>
</html>
 
2) To my test script I added the code to check whether there is a character set mismatch, as suggested by Weirdan, and indeed there does seem to be a character set mismatch.

My script now reads:

Code: Select all

 
<?php
 
$input = $_POST['greek'];
 
if ($input == "?????")
{
    echo "you said: <i>$input</i> \n";
}
else
{
    echo "you said something else";
}
echo  "<br> $input</br>" ;
echo "<br>Data from post: </br>" . bin2hex($_POST['greek']);
echo "<br>Data in file: </br>" . bin2hex("?????");
?>
 
Entering "?????" in the form now yields:
you said something else
σπιτι

Data from post:
cf83cf80ceb9cf84ceb9
Data in file:
73703f743f
From the results it is clear there is a character set mismatch, but I don't know how to resolve this.

Interestingly too, now that the html has been corrected the $input that is echoed is returning σπιτι, whereas before it contained "?????".

Something else I have noticed, is that when I close my text editor (phpDesigner 2008) and then open the script again, the word "?????" in the script has been replaced by "sp?t?", so it looks like my script editor doesn't like greek script. Perhaps this is the cause of the characterset mismatch?

Any ideas?

Thanks,
Tim.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Problem with Greek characters in PHP

Post by Weirdan »

Data in file:
73703f743f
Obviously you have your file encoded in one-byte encoding (most probably in cp1253 since it's greek)... check your editor's settings - there must be an option to save files in utf-8
timski72
Forum Newbie
Posts: 15
Joined: Sun Jan 13, 2008 6:19 am

Re: Problem with Greek characters in PHP

Post by timski72 »

Ah that did the trick. There was an option to set the file encoding to UTF-8. It's working correctly now. Now I know that my setup is correct, I can move on to trying to manipulate the strings. No doubt there's lots of pitfalls awaiting me there. I've got mbstring installed and as far as I can tell from what I've read on the web, that should enable string manipulation of unicode. Time to read the manual properly and find out :-)

Thanks for your help.
Tim.
devendra-m
Forum Contributor
Posts: 111
Joined: Wed Sep 12, 2007 3:16 am

Re: Problem with Greek characters in PHP

Post by devendra-m »

did you include following line on the top of php code
ini_set('default_charset', 'UTF-8');
timski72
Forum Newbie
Posts: 15
Joined: Sun Jan 13, 2008 6:19 am

Re: Problem with Greek characters in PHP

Post by timski72 »

Just after reading this last post, I ran my script and the Greek script displayed incorrectly. I checked the browser encoding and for some reason it had switched to Western European , despite being set to UTF-8 in the html. This seemed to happen only intermittently, but adding the

Code: Select all

ini_set('default_charset', 'UTF-8');
as suggested above seems to have resolved this. Thanks, Tim.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Problem with Greek characters in PHP

Post by VladSun »

timski72 wrote:I checked the browser encoding and for some reason it had switched to Western European , despite being set to UTF-8 in the html. This seemed to happen only intermittently, but adding the

Code: Select all

ini_set('default_charset', 'UTF-8');
as suggested above seems to have resolved this. Thanks, Tim.
The HTML meta tag could not set the charset if it has already been set by headers. I didn't know that

Code: Select all

ini_set('default_charset', 'UTF-8');
would send headers for me ... ( thanks, Tim :) )

I usually do it by sending my own headers:

Code: Select all

header('Content-Type: text/html; charset=utf-8');
I always send such headers because I don't know what would be the default charset sent by the hosting web server (i.e. Apache directive AddDefaultCharset), so I advice you always to do so (by headers or ini_set)
There are 10 types of people in this world, those who understand binary and those who don't
Post Reply