Page 1 of 1

replacing umlauts (special chars) in strings

Posted: Mon Aug 07, 2006 1:33 pm
by maha_x
Hey boys and girls, I need a hand please!

I'm working on my cousins webpages on my spare time, and have been succesfull mostly. The pages are almost done, except for a minor bugger: the scandinavian umlauts (ä and ö) turn out corrupted. I believe the browser is responsible, cos when I look at the HTML source (in notepad) the umlauts show up just fine. So I tough the safest way would be to replace them umlauts with their HTML codes like ä and ö So I lifted some code from php.net:

Code: Select all

$trans = array('ä' => 'ä', 'Ä' => 'Ä', 'ö' => 'ö', 'Ö' => 'Ö');
$ctmp = strtr($_POST['comment'], $trans);
But this doesn't appear to do anything, the letters still appear in their original form. I also tried a variation:

Code: Select all

$trans = array("ä" => 'ä', "Ä" => 'Ä', "ö" => 'ö', "Ö" => 'Ö');
$ctmp = str_replace(array_keys($trans), $trans, $_POST['comment']);
Without success. Maybe the problem is obvious... Maybe not? I came over from writing C and never really tried to use Finnish with my programs before...

Oh, and just for some background; I pick up the data from a form and simply write the entries into a txt file. And by directly looking at this file I can verify that the umlauts were not changed (also checked that the browser doesn't translate HTML codes when viewing txt files).

help much priciated!

Posted: Mon Aug 07, 2006 2:14 pm
by MarK (CZ)
I would suggest using unicode (utf8). Makes working with "non-standard" languages easier.

Posted: Mon Aug 07, 2006 2:39 pm
by volka
I second that.

And it might already be the "problem". If the form data is sent as utf-8 but the script is save as e.g. iso 8859-1 the 'ä' in the script will not match an ä in the form data. But then there's no need to replace the characters with latin-1 entitites anyway.

Posted: Tue Aug 08, 2006 8:42 am
by maha_x
Now googling "html unicode" produces something usefull, like this:

Code: Select all

<meta http-equiv="content-type" content="text-html; charset=utf-8">
I knew there had to be way to change the coding, I just couldn't google it up... Thanks guys!

Posted: Tue Aug 08, 2006 8:58 am
by CoderGoblin
UTF-8 Can cause problems with forms as well ($post/get varaibles) (You may also need to use utf8_decode/encode).

Another solution is to use ISO8859-1 (ISO8859-15 is using the € symbol).

Regards