Problem with mb_substr and unicode
Posted: Fri Jan 18, 2008 4:41 pm
Hello there,
I'm trying to write a script that transcribes Greek words into the Latin alphabet, so users who can't read the Greek alphabet will have some idea of how the word is pronounced.
I'm using the mb_string functions as these can handle unicode. I'm having trouble, however, with the mb_substr.
The script is as follows:
When I step through the code in debug mode in my PHP editor (phpDesigner 2008) the following piece of code is being executed:
... as expected. When I run call the script from the browser on localhost, however, I get the following results.
As you can see $NextLetter contains ? when run from the browser, but "O?" in the script editor. Because of this it isn't executing the code that replaces "??" with "ef", but is going into the else statement and replacing "??" with "ev".
I have created a screenshot so you can see the values I am getting in my variables, when I run the script in debug mode in my PHP editor. You can see the screenshot here: http://www.languageaddicts.com/Debugger.bmp.
Any ideas anyone?
Thanks,
Tim.
I'm trying to write a script that transcribes Greek words into the Latin alphabet, so users who can't read the Greek alphabet will have some idea of how the word is pronounced.
I'm using the mb_string functions as these can handle unicode. I'm having trouble, however, with the mb_substr.
The script is as follows:
Code: Select all
<?php
ini_set('default_charset', 'UTF-8');
// $input = $_POST['greek'];
// $input comes from form but for testing in editor set it here
$input = "?????????";
$string = "?";
$original = $input;
$pos = mb_strpos($input, "??");
if ($pos >= 0)
{
$NextLetter = mb_substr($input, $pos + 2, 1);
}
if ($NextLetter == "?")
{
$input = mb_ereg_replace("??", "ef", $input);
}
else
{
$input = mb_ereg_replace("??", "ev", $input);
}
echo "Your transliteration is: <i>$input</i> \n";
// Display items below for comparison and testing
echo "<br>NextLetter contains $NextLetter during script execution</br>";
echo "<br>Hardcoded in file (????????? ): </br>" . bin2hex("?????????");
echo "<br>Data from post ($original): </br>" . bin2hex($_POST['greek']);
?>Code: Select all
{
$NextLetter = mb_substr($input, $pos + 2, 1);
}
Your transliteration is: ev???????
NextLetter contains ? during script execution
Hardcoded in file (?????????):
ceb5cf85cf87ceb1cf81ceb9cf83cf84cf89
Data from post (?????????):
ceb5cf85cf87ceb1cf81ceb9cf83cf84cf89
As you can see $NextLetter contains ? when run from the browser, but "O?" in the script editor. Because of this it isn't executing the code that replaces "??" with "ef", but is going into the else statement and replacing "??" with "ev".
I have created a screenshot so you can see the values I am getting in my variables, when I run the script in debug mode in my PHP editor. You can see the screenshot here: http://www.languageaddicts.com/Debugger.bmp.
Any ideas anyone?
Thanks,
Tim.