Page 1 of 1

Pretty UTF-8 urls

Posted: Tue Jan 23, 2007 10:51 am
by CoderGoblin
The problem... /site/About in English = site/ÜberUns in German as a website page (Ok may not be a good translation but will serve for now). When a person clicks on a link the status bar shows the information correctly (http://www.website.de/ÜberUns). When the page loads shows"http://www.website.de/%C3%9CberUns" as the url. Can I set the url to show nicely somehow ? Is this a browser thing (client) or can I do something (i.e. with headers or something) so the browser shows the url as expected. At present I am testing with Firefox.

The background : Using ZendFramework 0.7 we set the routing, by default to english. The site has the possibility for additional languages. When changing languages a different routing is used (to reflect the current language). This language translation may contain characters other than the standard A-Z etc. At present everything works fine, the url is correctly processed but the url does not look nice for the user. We also have a future requirement for Polish.

Hopefully you can understand what I am asking...

EDIT: My current understanding is this is not possible, but if it is I would like to know how. At the moment it would be up to the translators to use UeberUns instead of ÜberUns.

Posted: Tue Jan 23, 2007 1:17 pm
by Kieran Huggins
Unfortunately, you are correct - it's not possible to have Unicode characters in a URL.

You can use mb_convert_encoding() to translate your Unicode URL to a URL-safe string for the link.

Posted: Wed Jan 24, 2007 4:13 pm
by Ambush Commander
No, you can have Unicode characters in your URL. They just won't pretty (they'll be the percent-encoded things you've seen)

Posted: Wed Jan 24, 2007 6:07 pm
by Kieran Huggins
Ambush Commander wrote:No, you can have Unicode characters in your URL. They just won't pretty (they'll be the percent-encoded things you've seen)
Sorry, yes - AC is correct. You "can" have unicode characters in a url, but they will be escaped.

You can use mb_convert_encoding() to translate your Unicode URL to a easy to read string for the link.

Posted: Wed Jan 24, 2007 10:46 pm
by feyd

Posted: Wed Jan 24, 2007 10:49 pm
by Ambush Commander
Hmm... I'd say punycode is kind of ugly too, and is designed for domain names. For readability, I think I would go with transliterated URIs. However, Wikipedia doesn't seem to mind oodles of percent encoded URIs all over the place.

Posted: Thu Jan 25, 2007 2:38 am
by CoderGoblin
Kieran Huggins wrote:You can use mb_convert_encoding() to translate your Unicode URL to a easy to read string for the link.
Thanks for everybodies replies. As stated originally, I thought this was the case. The actual link already shows correctly (in the status bar) without using mb_convert_encoding. I guess it is up to the translators. After all, the URL display is not a major thing, Most sites I know still retain the default language url name, despite the language you set the thing to.