Page 1 of 1

Matching any char from any language

Posted: Tue Aug 26, 2008 7:44 pm
by pedrotuga

Code: Select all

if ( preg_match('/\^[p{Letter}\s]+$/u', $tagstring) == 0 ){
            $this->validation->set_message('_check_valid_tags', 'Tags must contain only leters from any language');
            return FALSE;
        }
I'm trying to match a string only with spaces and letters. Letters can be in any language.

What's wrong with my regex? Or with any other trouble source.

Re: Matching any char from any language

Posted: Wed Aug 27, 2008 12:42 am
by prometheuzz
A couple of things:
- long property names as \p{Letter} are not supported by PHP's regex engine. You can use \p{L} intsead.
- you don't need to escape the beginning of the string character: ^, if you do, it would just match the character '^'
- if the strings can become large, you could speed things up by making the greedy '+' possessive by putting an extra '+' after it

So, that would make:

Code: Select all

'/^[\p{L}\s]++$/'
the final regex.

HTH.

Re: Matching any char from any language

Posted: Wed Aug 27, 2008 2:36 am
by GeertDD
Note that $ allows for a final newline to be included in the string you are testing. Add a D modifier to prevent it.
Also see: http://blog.php-security.org/archives/7 ... lters.html

Just for your information, here is another alternative syntax which is supported and a bit shorter: \pL

Re: Matching any char from any language

Posted: Wed Aug 27, 2008 2:40 am
by prometheuzz
GeertDD wrote:Note that $ allows for a final newline to be included in the string you are testing. Add a D modifier to prevent it.
Also see: http://blog.php-security.org/archives/7 ... lters.html

Just for your information, here is another alternative syntax which is supported and a bit shorter: \pL
All of this was also new to me!
Thanks Geert!

Re: Matching any char from any language

Posted: Wed Aug 27, 2008 7:12 am
by pedrotuga
Ops, the escape slash in the beginning was a typo. I meant

/^[\p{Letter}\s]+$/u

Thank you for the reply, I wasn't aware of the lack of support of long syntax for char classes in PHP pcre library. Nor the issue speed.

Thank you for all other tips too.