Matching any char from any language

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
pedrotuga
Forum Contributor
Posts: 249
Joined: Tue Dec 13, 2005 11:08 pm

Matching any char from any language

Post by pedrotuga »

Code: Select all

if ( preg_match('/\^[p{Letter}\s]+$/u', $tagstring) == 0 ){
            $this->validation->set_message('_check_valid_tags', 'Tags must contain only leters from any language');
            return FALSE;
        }
I'm trying to match a string only with spaces and letters. Letters can be in any language.

What's wrong with my regex? Or with any other trouble source.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Matching any char from any language

Post by prometheuzz »

A couple of things:
- long property names as \p{Letter} are not supported by PHP's regex engine. You can use \p{L} intsead.
- you don't need to escape the beginning of the string character: ^, if you do, it would just match the character '^'
- if the strings can become large, you could speed things up by making the greedy '+' possessive by putting an extra '+' after it

So, that would make:

Code: Select all

'/^[\p{L}\s]++$/'
the final regex.

HTH.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Re: Matching any char from any language

Post by GeertDD »

Note that $ allows for a final newline to be included in the string you are testing. Add a D modifier to prevent it.
Also see: http://blog.php-security.org/archives/7 ... lters.html

Just for your information, here is another alternative syntax which is supported and a bit shorter: \pL
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Matching any char from any language

Post by prometheuzz »

GeertDD wrote:Note that $ allows for a final newline to be included in the string you are testing. Add a D modifier to prevent it.
Also see: http://blog.php-security.org/archives/7 ... lters.html

Just for your information, here is another alternative syntax which is supported and a bit shorter: \pL
All of this was also new to me!
Thanks Geert!
User avatar
pedrotuga
Forum Contributor
Posts: 249
Joined: Tue Dec 13, 2005 11:08 pm

Re: Matching any char from any language

Post by pedrotuga »

Ops, the escape slash in the beginning was a typo. I meant

/^[\p{Letter}\s]+$/u

Thank you for the reply, I wasn't aware of the lack of support of long syntax for char classes in PHP pcre library. Nor the issue speed.

Thank you for all other tips too.
Post Reply