Page 1 of 1
Identifying if a string contains Japanese
Posted: Tue Mar 02, 2010 9:33 am
by huhuhu
Hello everyone, I'm having some problems with regex looking forward to get some help: I want to identify if a string contains japanese characters, now as far as I'm aware japanese is devided to 3 written forms: Hiragana, Katakana and Kanji, I have yet to find out how I implement such a search on regex, any ideas?
thanks in advance
Re: Identifying if a string contains Japanese
Posted: Tue Mar 02, 2010 1:00 pm
by tr0gd0rr
If the string is UTF-8 you can check by Unicode ranges:
U+4E00–U+9FBF Kanji
U+3040–U+309F Hiragana
U+30A0–U+30FF Katakana
(from:
http://en.wikipedia.org/wiki/Japanese_writing_system)
Code Example:
http://php.net/manual/en/reference.pcre ... .php#58409
If the string is potentially in a Japanese charset, you could use mb_detect_encoding() or iconv() to find the charset.