Page 1 of 1
validate some input
Posted: Wed Dec 27, 2006 7:08 am
by amir
I have to validate some input for some scientific text. I would like to filter out any dangerous characters but allow certain symbols and am not sure how to do this.
I have a regexp string here
but would like to do the following
a. Allow numerical input
b. Allow certain other symbols such as those used in chemical formulae eg.
I have been trying to learn regexp enough to put this together and am trying to put two different versions together for a and b above. My main concern is security.
Posted: Wed Dec 27, 2006 7:48 am
by feyd
If memory serves correctly the format is actually <number><symbol> in scientific circles, but for laymen is in the style you described.
At any rate, here's one that's fairly restrictive:
Code: Select all
[feyd@home]>php -r "$test = 'H<sub>2</sub>O'; preg_match('#^\s*((?:[A-Z](?:[a-z])?(?i:<sub>[1-9][0-9]*</sub>)?)+)\s*$#', $test, $match); var_dump($match);"
array(2) {
[0]=>
string(14) "H<sub>2</sub>O"
[1]=>
string(14) "H<sub>2</sub>O"
}
[feyd@home]>php -r "$test = 'H<sub>2</sub>O<sub>015</sub>'; preg_match('#^\s*((?:[A-Z](?:[a-z])?(?i:<sub>[1-9][0-9]*</sub>)?)+)\s*$#', $test, $match); var_dump($match);"
array(0) {
}
[feyd@home]>php -r "$test = 'H<sub>2</sub>O<sub>15</sub>'; preg_match('#^\s*((?:[A-Z](?:[a-z])?(?i:<sub>[1-9][0-9]*</sub>)?)+)\s*$#', $test, $match); var_dump($match);"
array(2) {
[0]=>
string(27) "H<sub>2</sub>O<sub>15</sub>"
[1]=>
string(27) "H<sub>2</sub>O<sub>15</sub>"
}
Posted: Wed Dec 27, 2006 9:52 am
by amir
Thanks for quick response!
Can I simply change this to
in order to allow all alphanumeric characters? (obviously this will not allow the <sub> html).
Thanks once again.
Posted: Wed Dec 27, 2006 11:21 am
by feyd
With "^" as the initial character in your character class, the pattern it will look for is not any of the following characters. So your regex will now look symbols and special characters (excluding "<" and ">".)
Posted: Wed Dec 27, 2006 11:30 am
by amir
Thanks. Can I just clarify what you are saying,
If I use regexp='#[^a-z0-9<>]#i' it will not allow any characters EXCEPT alphanumeric and also "<" and ">"
Many thanks!
Posted: Wed Dec 27, 2006 11:32 am
by feyd
No, it won't match alphanumerics, "<" and ">".