I have a page which parses the GET string, but whenever I call it with Unicode characters like this:
דרמה
The page doesn't load and never parses the string.
When I use regular ASCII characters it works.
How can I get it to work?
Problem with Unicode characters in URL
Moderator: General Moderators
Re: Problem with Unicode characters in URL
Are you UTF-8-encoding the data? It needs to be.
Re: Problem with Unicode characters in URL
Yes I am, with this at the top:
header('Content-Type:text/html; charset=UTF-8');
header('Content-Type:text/html; charset=UTF-8');
Re: Problem with Unicode characters in URL
I believe you need to run rawurlencode on the hebrew characters in the get string before passing the link to the browser. So instead of `http://example.com/?param=דרמה` you'll need `http://example.com/?param=%D7%93%D7%A8%D7%9E%D7%94`
Re: Problem with Unicode characters in URL
But doesn't [b]rawurlencode[/b] converts characters to 1 byte characters?
I'm working with UTF-8 in which Hebrew chars occupy 2 bytes per char.
Also I want to mention that I pass these values directly in the URL but the page still doesn't get parsed (ignore the spaces) :
& #1491; & #1512; & #1502; & #1492;
I'm working with UTF-8 in which Hebrew chars occupy 2 bytes per char.
Also I want to mention that I pass these values directly in the URL but the page still doesn't get parsed (ignore the spaces) :
& #1491; & #1512; & #1502; & #1492;
Re: Problem with Unicode characters in URL
No, rawurlencode works fine with UTF8. I just tried it: `echo $_GET['param'];` displays the Hebrew just fine.
I don't know the technical terms to describe UTF-8, but here is my understanding. When the final bit of the byte is on, it tells UTF-8 to include the following byte when determining the Unicode code. In other words, UTF-8 is not two bytes per character, it is one or more bytes per character. The current Unicode set uses up to 4 bytes in UTF-8. So in your example, if you append a letter A to the hebrew string, it would be a 9-byte string, not a 10-byte string. If you used 4 Chinese characters (which are in the upper ends of unicode) you would probably have a 16-byte string in UTF-8.
PHP automatically decodes with UTF-8 in mind when it populates $_GET/$_POST/etc.
I don't know the technical terms to describe UTF-8, but here is my understanding. When the final bit of the byte is on, it tells UTF-8 to include the following byte when determining the Unicode code. In other words, UTF-8 is not two bytes per character, it is one or more bytes per character. The current Unicode set uses up to 4 bytes in UTF-8. So in your example, if you append a letter A to the hebrew string, it would be a 9-byte string, not a 10-byte string. If you used 4 Chinese characters (which are in the upper ends of unicode) you would probably have a 16-byte string in UTF-8.
PHP automatically decodes with UTF-8 in mind when it populates $_GET/$_POST/etc.