Page 1 of 1

Hebrew strings

Posted: Mon Feb 08, 2010 7:38 am
by shaharh
Hi,
if I have

Code: Select all

$string = 'abc';
$string[0] == 'a';
But,

Code: Select all

$string = '???';
$string[0] == some character that I have no idea what is it or how to handle it.
Any suggestions?

Thanks :D

Re: Hebrew strings

Posted: Mon Feb 08, 2010 8:15 am
by Apollo
You say
shaharh wrote:

Code: Select all

$string = '???';
But this is meaningless without specifying how this is encoded.

Assuming you saved the above source code to some php file, what encoding does your editor use?

Re: Hebrew strings

Posted: Mon Feb 08, 2010 8:27 am
by shaharh
Sorry, UTF-8

Re: Hebrew strings

Posted: Mon Feb 08, 2010 8:40 am
by Apollo
Then $string[0] will probably be '×' (that is, chr(0xD7) or "\xD7"), the first byte of your hebrew text in UTF-8 encoding.

You mentioned you don't know know 'how to handle it'. How exactly do you handle the 'a' in 'abc' ? :)

Re: Hebrew strings

Posted: Mon Feb 08, 2010 8:47 am
by shaharh
:D

right, I managed to get to '\xD7'...

What I'm trying to do is list an array of strings alphabetically -
I check the letter of the alphabet I'm on against the string's first character, at least that's what worked in English.
How can I get the whole 'first character' of the string so I can use it in an if() ?

Thanks!

Re: Hebrew strings

Posted: Mon Feb 08, 2010 9:01 am
by Apollo
In that case you have to extract the entire unicode character code (or 'codepoint') of the first character, which may consist of multiple bytes.

But it's not trivial how you want to sort this. For example, what do you consider to be the correct alphabetical order of these characters?
и (Russian) , 剑 (Chinese) , ij (Dutch) , ह (Hindi) , א (Hebrew) , ∫ (Math)

(in case your browser doesn't show this correctly, I mean these characters)

Re: Hebrew strings

Posted: Mon Feb 08, 2010 9:05 am
by shaharh
I'm only dealing with Hebrew letters, so that much at least is clear :)

How do I extract the entire unicode character code?

Re: Hebrew strings

Posted: Mon Feb 08, 2010 9:20 am
by Apollo
I guess there are plenty of example functions or libraries out there that can do so, for example "UTF-8 to Code Point Array Converter" seems to do just that.

Re: Hebrew strings

Posted: Mon Feb 08, 2010 9:30 am
by shaharh
Thank you so much for your help :D

Re: Hebrew strings

Posted: Mon Feb 08, 2010 9:44 am
by shaharh
Tested it, and using the package you suggested it's working perfectly.

Thanks again!

Re: Hebrew strings

Posted: Mon Feb 08, 2010 9:47 am
by Eran

Code: Select all

$firstLetter = mb_substr($string,0,1,'utf-8');
http://php.net/manual/en/function.mb-substr.php

Re: Hebrew strings

Posted: Mon Feb 08, 2010 10:39 am
by shaharh
Now that's a good solution. Guess you gotta know what to search for!

Thanks pytrin