filter_var and regex (pass Unicode characters only)

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
spamyboy
Forum Contributor
Posts: 266
Joined: Sun Nov 06, 2005 11:29 am
Location: Lithuania, vilnius

filter_var and regex (pass Unicode characters only)

Post by spamyboy »

I am trying to create input validation that would pass only characters and white-spaces.
This is where I am. Though I can't use [a-zA-Z] because input also might be kiliric, Baltic alphabet (eg. ?????Š??Ž, etc) or Asian etc.
Any ideas?

Code: Select all

var_dump(filter_var('ABC???', FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>"/P{L}/u"))));
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: filter_var and regex (pass Unicode characters only)

Post by prometheuzz »

Note that there should be a backslash before such character sets:

Code: Select all

\P{L}
But a capital P denotes any character other than a letter. You probably want:

Code: Select all

\p{L}
which matches a single letter in pretty much any language AFAIK.
User avatar
spamyboy
Forum Contributor
Posts: 266
Joined: Sun Nov 06, 2005 11:29 am
Location: Lithuania, vilnius

Re: filter_var and regex (pass Unicode characters only)

Post by spamyboy »

This still passes trough 'ABC??&*' as a valid string, though it contains '&*'.

Code: Select all

filter_var('ABC??&*', FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>"/\p{L}/u"))
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: filter_var and regex (pass Unicode characters only)

Post by prometheuzz »

spamyboy wrote:This still passes trough 'ABC??&*' as a valid string, though it contains '&*'.

Code: Select all

filter_var('ABC??&*', FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>"/\p{L}/u"))
I've never used this filter_var(...) but with the normal preg-functions, '/\p{L}/' will match any string containing a letter. Matching a string only when it solely consists of letters would be done like this: '/^\p{L}+$/'
^ denotes the start of the string, $ is the end of the string and + means "one or more".

HTH
User avatar
spamyboy
Forum Contributor
Posts: 266
Joined: Sun Nov 06, 2005 11:29 am
Location: Lithuania, vilnius

Re: filter_var and regex (pass Unicode characters only)

Post by spamyboy »

Thank you, that helped me a lot.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: filter_var and regex (pass Unicode characters only)

Post by prometheuzz »

spamyboy wrote:Thank you, that helped me a lot.
You're welcome.
Post Reply