Identifying if a string contains Japanese

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
huhuhu
Forum Newbie
Posts: 1
Joined: Tue Mar 02, 2010 9:32 am

Identifying if a string contains Japanese

Post by huhuhu »

Hello everyone, I'm having some problems with regex looking forward to get some help: I want to identify if a string contains japanese characters, now as far as I'm aware japanese is devided to 3 written forms: Hiragana, Katakana and Kanji, I have yet to find out how I implement such a search on regex, any ideas?


thanks in advance
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: Identifying if a string contains Japanese

Post by tr0gd0rr »

If the string is UTF-8 you can check by Unicode ranges:

U+4E00–U+9FBF Kanji
U+3040–U+309F Hiragana
U+30A0–U+30FF Katakana
(from: http://en.wikipedia.org/wiki/Japanese_writing_system)

Code Example:
http://php.net/manual/en/reference.pcre ... .php#58409

If the string is potentially in a Japanese charset, you could use mb_detect_encoding() or iconv() to find the charset.
Post Reply