STRPOS and SUBSTR with UTF-8 flawed?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
voltrader
Forum Contributor
Posts: 223
Joined: Wed Jul 07, 2004 12:44 pm
Location: SF Bay Area

STRPOS and SUBSTR with UTF-8 flawed?

Post by voltrader »

I'd like to locate the position of a string in UTF-8.

Code: Select all

echo substr("日本最大級のポータルサイト。検索、オークション","検索", 5);
When I tried this line of code, I received:

日�

I was expecting the last 5 characters after "検索", which should be "、オーク"

Can STRPOS or SUBSTR be used with UTF-8 strings?
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

No, you need to use their multibyte equivalents: mb_strpos() and mb_substr(). Obviously these need the mbstring extension, but that's part of most default installations.
User avatar
voltrader
Forum Contributor
Posts: 223
Joined: Wed Jul 07, 2004 12:44 pm
Location: SF Bay Area

Post by voltrader »

Code: Select all

echo mb_substr("日本最大級のポータルサイト。検索、オークション","サイト", 5);
It doesn't seem to work as I would expect. For instance, this should return the last five characters after "サイト", but what I receive is "日本最大級" which is the first 5 characters of the string.

It seems the needle "サイト" is not found.

::confused::
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

It might not work as you expect, but it certainly works as the manual states. mb_substr() doesn't expect a needle.
User avatar
voltrader
Forum Contributor
Posts: 223
Joined: Wed Jul 07, 2004 12:44 pm
Location: SF Bay Area

Post by voltrader »

Ah, I see my mistake now.

Thanks!
Post Reply