Problem with mb_substr and unicode

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
timski72
Forum Newbie
Posts: 15
Joined: Sun Jan 13, 2008 6:19 am

Problem with mb_substr and unicode

Post by timski72 »

Hello there,

I'm trying to write a script that transcribes Greek words into the Latin alphabet, so users who can't read the Greek alphabet will have some idea of how the word is pronounced.

I'm using the mb_string functions as these can handle unicode. I'm having trouble, however, with the mb_substr.

The script is as follows:

Code: Select all

 
<?php
ini_set('default_charset', 'UTF-8');
// $input = $_POST['greek'];
// $input comes from form but for testing in editor set it here
$input = "?????????";
$string = "?";
$original = $input;
$pos = mb_strpos($input, "??");
if ($pos >= 0)
{
    $NextLetter = mb_substr($input, $pos + 2, 1);
}
if ($NextLetter ==  "?")
{
    $input = mb_ereg_replace("??", "ef", $input);
} 
else
{
    $input = mb_ereg_replace("??", "ev", $input);
}
echo "Your transliteration is: <i>$input</i> \n";
// Display items below for comparison and testing
echo "<br>NextLetter contains $NextLetter during script execution</br>";
echo "<br>Hardcoded in file (????????? ): </br>" . bin2hex("?????????");
echo "<br>Data from post ($original): </br>" . bin2hex($_POST['greek']);
?>
When I step through the code in debug mode in my PHP editor (phpDesigner 2008) the following piece of code is being executed:

Code: Select all

{
    $NextLetter = mb_substr($input, $pos + 2, 1);
}
 
... as expected. When I run call the script from the browser on localhost, however, I get the following results.
Your transliteration is: ev???????
NextLetter contains ? during script execution
Hardcoded in file (?????????):
ceb5cf85cf87ceb1cf81ceb9cf83cf84cf89
Data from post (?????????):
ceb5cf85cf87ceb1cf81ceb9cf83cf84cf89

As you can see $NextLetter contains ? when run from the browser, but "O?" in the script editor. Because of this it isn't executing the code that replaces "??" with "ef", but is going into the else statement and replacing "??" with "ev".

I have created a screenshot so you can see the values I am getting in my variables, when I run the script in debug mode in my PHP editor. You can see the screenshot here: http://www.languageaddicts.com/Debugger.bmp.


Any ideas anyone?
Thanks,
Tim.
timski72
Forum Newbie
Posts: 15
Joined: Sun Jan 13, 2008 6:19 am

Re: Problem with mb_substr and unicode

Post by timski72 »

Hiya,

I seem to have solved this issue myself. By adding the following line of code

Code: Select all

mb_internal_encoding("UTF-8");
just below

Code: Select all

ini_set('default_charset', 'UTF-8');
in the script show in the original post, the script now works in the browser as well as in the script editor debugger.

For my own understanding though, I'd be interested to know what is the difference between the two snippits of code as I thought they were just two ways of doing the same thing?

Thanks,
Tim.
Post Reply