Any questions involving matching text strings to patterns - the pattern is called a "regular expression."
Moderator: General Moderators
amir
Forum Contributor
Posts: 287 Joined: Sat Oct 07, 2006 4:28 pm
Post
by amir » Wed Dec 27, 2006 7:08 am
I have to validate some input for some scientific text. I would like to filter out any dangerous characters but allow certain symbols and am not sure how to do this.
I have a regexp string here
but would like to do the following
a. Allow numerical input
b. Allow certain other symbols such as those used in chemical formulae eg.
I have been trying to learn regexp enough to put this together and am trying to put two different versions together for a and b above. My main concern is security.
feyd
Neighborhood Spidermoddy
Posts: 31559 Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA
Post
by feyd » Wed Dec 27, 2006 7:48 am
If memory serves correctly the format is actually <number><symbol> in scientific circles, but for laymen is in the style you described.
At any rate, here's one that's fairly restrictive:
Code: Select all
[feyd@home]>php -r "$test = 'H<sub>2</sub>O'; preg_match('#^\s*((?:[A-Z](?:[a-z])?(?i:<sub>[1-9][0-9]*</sub>)?)+)\s*$#', $test, $match); var_dump($match);"
array(2) {
[0]=>
string(14) "H<sub>2</sub>O"
[1]=>
string(14) "H<sub>2</sub>O"
}
[feyd@home]>php -r "$test = 'H<sub>2</sub>O<sub>015</sub>'; preg_match('#^\s*((?:[A-Z](?:[a-z])?(?i:<sub>[1-9][0-9]*</sub>)?)+)\s*$#', $test, $match); var_dump($match);"
array(0) {
}
[feyd@home]>php -r "$test = 'H<sub>2</sub>O<sub>15</sub>'; preg_match('#^\s*((?:[A-Z](?:[a-z])?(?i:<sub>[1-9][0-9]*</sub>)?)+)\s*$#', $test, $match); var_dump($match);"
array(2) {
[0]=>
string(27) "H<sub>2</sub>O<sub>15</sub>"
[1]=>
string(27) "H<sub>2</sub>O<sub>15</sub>"
}
amir
Forum Contributor
Posts: 287 Joined: Sat Oct 07, 2006 4:28 pm
Post
by amir » Wed Dec 27, 2006 9:52 am
Thanks for quick response!
Can I simply change this to
in order to allow all alphanumeric characters? (obviously this will not allow the <sub> html).
Thanks once again.
feyd
Neighborhood Spidermoddy
Posts: 31559 Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA
Post
by feyd » Wed Dec 27, 2006 11:21 am
With "^" as the initial character in your character class, the pattern it will look for is not any of the following characters. So your regex will now look symbols and special characters (excluding "<" and ">".)
amir
Forum Contributor
Posts: 287 Joined: Sat Oct 07, 2006 4:28 pm
Post
by amir » Wed Dec 27, 2006 11:30 am
Thanks. Can I just clarify what you are saying,
If I use regexp='#[^a-z0-9<>]#i' it will not allow any characters EXCEPT alphanumeric and also "<" and ">"
Many thanks!
feyd
Neighborhood Spidermoddy
Posts: 31559 Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA
Post
by feyd » Wed Dec 27, 2006 11:32 am
No, it won't match alphanumerics, "<" and ">".