Problem with Unicode characters in URL

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
meathook
Forum Newbie
Posts: 3
Joined: Sun Apr 01, 2012 5:32 pm

Problem with Unicode characters in URL

Post by meathook »

I have a page which parses the GET string, but whenever I call it with Unicode characters like this:
דרמה
The page doesn't load and never parses the string.

When I use regular ASCII characters it works.

How can I get it to work?
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Problem with Unicode characters in URL

Post by requinix »

Are you UTF-8-encoding the data? It needs to be.
meathook
Forum Newbie
Posts: 3
Joined: Sun Apr 01, 2012 5:32 pm

Re: Problem with Unicode characters in URL

Post by meathook »

Yes I am, with this at the top:
header('Content-Type:text/html; charset=UTF-8');
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: Problem with Unicode characters in URL

Post by tr0gd0rr »

I believe you need to run rawurlencode on the hebrew characters in the get string before passing the link to the browser. So instead of `http://example.com/?param=דרמה` you'll need `http://example.com/?param=%D7%93%D7%A8%D7%9E%D7%94`
meathook
Forum Newbie
Posts: 3
Joined: Sun Apr 01, 2012 5:32 pm

Re: Problem with Unicode characters in URL

Post by meathook »

But doesn't [b]rawurlencode[/b] converts characters to 1 byte characters?
I'm working with UTF-8 in which Hebrew chars occupy 2 bytes per char.

Also I want to mention that I pass these values directly in the URL but the page still doesn't get parsed (ignore the spaces) :
& #1491; & #1512; & #1502; & #1492;
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: Problem with Unicode characters in URL

Post by tr0gd0rr »

No, rawurlencode works fine with UTF8. I just tried it: `echo $_GET['param'];` displays the Hebrew just fine.

I don't know the technical terms to describe UTF-8, but here is my understanding. When the final bit of the byte is on, it tells UTF-8 to include the following byte when determining the Unicode code. In other words, UTF-8 is not two bytes per character, it is one or more bytes per character. The current Unicode set uses up to 4 bytes in UTF-8. So in your example, if you append a letter A to the hebrew string, it would be a 9-byte string, not a 10-byte string. If you used 4 Chinese characters (which are in the upper ends of unicode) you would probably have a 16-byte string in UTF-8.

PHP automatically decodes with UTF-8 in mind when it populates $_GET/$_POST/etc.
Post Reply