I am using the PHP preg_match function to validate Hebrew input to a Web form. Since the target match is a Hebrew word, the most simple regex would be:
preg_match("/^\p{Hebrew}+$/u",$var);
This is not always sufficient for Hebrew words can include two characters that have the punctuation Unicode property. For example, United States is ארה״ב (the doublequote-like character does not have \p{Hebrew} property).
I expected that I could make a kind of user-defined character class by combining a Unicode property with a character class.
preg_match("/^(\p{Hebrew}|[׳״])+$/u",$var);
The above construction, however, cannot match ארה״ב while ארהב as well as ״ are okay.
Could anyone help me understand why it does not work?
Unicode Property and Character Class
Moderator: General Moderators
-
veleshanas
- Forum Newbie
- Posts: 10
- Joined: Sat May 24, 2008 10:58 pm
Re: Unicode Property and Character Class
You are telling the regex to either match one or more characters in the Hebrew Unicode property, or one or more punctuation marks.
Combine both properties in a character class and things should be fixed.
/^[\p{Hebrew}׳״]+$/u
Combine both properties in a character class and things should be fixed.
/^[\p{Hebrew}׳״]+$/u
-
veleshanas
- Forum Newbie
- Posts: 10
- Joined: Sat May 24, 2008 10:58 pm
Re: Unicode Property and Character Class
Hello GeertDD,GeertDD wrote:Combine both properties in a character class and things should be fixed.
/^[\p{Hebrew}׳״]+$/u
I didn't know that I can write a Unicode property within a character class. Good to know that not everything within a [ ] is literal.