validate some input

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
amir
Forum Contributor
Posts: 287
Joined: Sat Oct 07, 2006 4:28 pm

validate some input

Post by amir »

I have to validate some input for some scientific text. I would like to filter out any dangerous characters but allow certain symbols and am not sure how to do this.

I have a regexp string here

Code: Select all

"/^[a-z'\s]*$/i"
but would like to do the following

a. Allow numerical input
b. Allow certain other symbols such as those used in chemical formulae eg.

Code: Select all

H<sub>2</sub>0
I have been trying to learn regexp enough to put this together and am trying to put two different versions together for a and b above. My main concern is security.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

If memory serves correctly the format is actually <number><symbol> in scientific circles, but for laymen is in the style you described.

At any rate, here's one that's fairly restrictive:

Code: Select all

[feyd@home]>php -r "$test = 'H<sub>2</sub>O'; preg_match('#^\s*((?:[A-Z](?:[a-z])?(?i:<sub>[1-9][0-9]*</sub>)?)+)\s*$#', $test, $match); var_dump($match);"
array(2) {
  [0]=>
  string(14) "H<sub>2</sub>O"
  [1]=>
  string(14) "H<sub>2</sub>O"
}

[feyd@home]>php -r "$test = 'H<sub>2</sub>O<sub>015</sub>'; preg_match('#^\s*((?:[A-Z](?:[a-z])?(?i:<sub>[1-9][0-9]*</sub>)?)+)\s*$#', $test, $match); var_dump($match);"
array(0) {
}

[feyd@home]>php -r "$test = 'H<sub>2</sub>O<sub>15</sub>'; preg_match('#^\s*((?:[A-Z](?:[a-z])?(?i:<sub>[1-9][0-9]*</sub>)?)+)\s*$#', $test, $match); var_dump($match);"
array(2) {
  [0]=>
  string(27) "H<sub>2</sub>O<sub>15</sub>"
  [1]=>
  string(27) "H<sub>2</sub>O<sub>15</sub>"
}
amir
Forum Contributor
Posts: 287
Joined: Sat Oct 07, 2006 4:28 pm

Post by amir »

Thanks for quick response!

Code: Select all

regexp="/^[a-z'\s]*$/i"
Can I simply change this to

Code: Select all

regexp='#[^a-z0-9<>]#i'
in order to allow all alphanumeric characters? (obviously this will not allow the <sub> html).

Thanks once again.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

With "^" as the initial character in your character class, the pattern it will look for is not any of the following characters. So your regex will now look symbols and special characters (excluding "<" and ">".)
amir
Forum Contributor
Posts: 287
Joined: Sat Oct 07, 2006 4:28 pm

Post by amir »

Thanks. Can I just clarify what you are saying,

If I use regexp='#[^a-z0-9<>]#i' it will not allow any characters EXCEPT alphanumeric and also "<" and ">"

Many thanks!
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

No, it won't match alphanumerics, "<" and ">".
Post Reply