Page 1 of 1

STRPOS and SUBSTR with UTF-8 flawed?

Posted: Mon Mar 26, 2007 12:05 pm
by voltrader
I'd like to locate the position of a string in UTF-8.

Code: Select all

echo substr("日本最大級のポータルサイト。検索、オークション","検索", 5);
When I tried this line of code, I received:

日�

I was expecting the last 5 characters after "検索", which should be "、オーク"

Can STRPOS or SUBSTR be used with UTF-8 strings?

Posted: Mon Mar 26, 2007 12:12 pm
by onion2k
No, you need to use their multibyte equivalents: mb_strpos() and mb_substr(). Obviously these need the mbstring extension, but that's part of most default installations.

Posted: Mon Mar 26, 2007 12:33 pm
by voltrader

Code: Select all

echo mb_substr("日本最大級のポータルサイト。検索、オークション","サイト", 5);
It doesn't seem to work as I would expect. For instance, this should return the last five characters after "サイト", but what I receive is "日本最大級" which is the first 5 characters of the string.

It seems the needle "サイト" is not found.

::confused::

Posted: Mon Mar 26, 2007 1:12 pm
by onion2k
It might not work as you expect, but it certainly works as the manual states. mb_substr() doesn't expect a needle.

Posted: Mon Mar 26, 2007 2:10 pm
by voltrader
Ah, I see my mistake now.

Thanks!