Hebrew strings

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
shaharh
Forum Newbie
Posts: 7
Joined: Mon Feb 08, 2010 7:29 am
Location: Jerusalem

Hebrew strings

Post by shaharh »

Hi,
if I have

Code: Select all

$string = 'abc';
$string[0] == 'a';
But,

Code: Select all

$string = '???';
$string[0] == some character that I have no idea what is it or how to handle it.
Any suggestions?

Thanks :D
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Hebrew strings

Post by Apollo »

You say
shaharh wrote:

Code: Select all

$string = '???';
But this is meaningless without specifying how this is encoded.

Assuming you saved the above source code to some php file, what encoding does your editor use?
User avatar
shaharh
Forum Newbie
Posts: 7
Joined: Mon Feb 08, 2010 7:29 am
Location: Jerusalem

Re: Hebrew strings

Post by shaharh »

Sorry, UTF-8
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Hebrew strings

Post by Apollo »

Then $string[0] will probably be '×' (that is, chr(0xD7) or "\xD7"), the first byte of your hebrew text in UTF-8 encoding.

You mentioned you don't know know 'how to handle it'. How exactly do you handle the 'a' in 'abc' ? :)
User avatar
shaharh
Forum Newbie
Posts: 7
Joined: Mon Feb 08, 2010 7:29 am
Location: Jerusalem

Re: Hebrew strings

Post by shaharh »

:D

right, I managed to get to '\xD7'...

What I'm trying to do is list an array of strings alphabetically -
I check the letter of the alphabet I'm on against the string's first character, at least that's what worked in English.
How can I get the whole 'first character' of the string so I can use it in an if() ?

Thanks!
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Hebrew strings

Post by Apollo »

In that case you have to extract the entire unicode character code (or 'codepoint') of the first character, which may consist of multiple bytes.

But it's not trivial how you want to sort this. For example, what do you consider to be the correct alphabetical order of these characters?
и (Russian) , 剑 (Chinese) , ij (Dutch) , ह (Hindi) , א (Hebrew) , ∫ (Math)

(in case your browser doesn't show this correctly, I mean these characters)
User avatar
shaharh
Forum Newbie
Posts: 7
Joined: Mon Feb 08, 2010 7:29 am
Location: Jerusalem

Re: Hebrew strings

Post by shaharh »

I'm only dealing with Hebrew letters, so that much at least is clear :)

How do I extract the entire unicode character code?
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Hebrew strings

Post by Apollo »

I guess there are plenty of example functions or libraries out there that can do so, for example "UTF-8 to Code Point Array Converter" seems to do just that.
User avatar
shaharh
Forum Newbie
Posts: 7
Joined: Mon Feb 08, 2010 7:29 am
Location: Jerusalem

Re: Hebrew strings

Post by shaharh »

Thank you so much for your help :D
User avatar
shaharh
Forum Newbie
Posts: 7
Joined: Mon Feb 08, 2010 7:29 am
Location: Jerusalem

Re: Hebrew strings

Post by shaharh »

Tested it, and using the package you suggested it's working perfectly.

Thanks again!
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Hebrew strings

Post by Eran »

Code: Select all

$firstLetter = mb_substr($string,0,1,'utf-8');
http://php.net/manual/en/function.mb-substr.php
User avatar
shaharh
Forum Newbie
Posts: 7
Joined: Mon Feb 08, 2010 7:29 am
Location: Jerusalem

Re: Hebrew strings

Post by shaharh »

Now that's a good solution. Guess you gotta know what to search for!

Thanks pytrin
Post Reply