Page 1 of 2
Filtering form input
Posted: Mon Aug 26, 2002 12:57 pm
by Xelmepa
I'm sure I've seen this, probably around here, but I can't find it again...
I need to find if a string contains a specific charachter, for security issues, filtering user input.
So for example I want to find if a string contains the charachter ";" or "..".
Anyone?
Posted: Mon Aug 26, 2002 1:16 pm
by phpPete
Code: Select all
$target = ";";
$string = "I am an invalid ; string";
if(preg_match("/$target/", $string))
{
echo "Warning!! Warning!! The sky is falling!!!!";
}
relevant PHP manual page:
http://www.php.net/manual/en/function.preg-match.php
Posted: Mon Aug 26, 2002 1:17 pm
by hob_goblin
Code: Select all
if(strpos($mystring, ';')){ echo "a ';' was found"; }
Posted: Mon Aug 26, 2002 1:19 pm
by hob_goblin
i am pretty sure strpos, and strstr are faster.
Posted: Mon Aug 26, 2002 1:22 pm
by phpPete
They are, and that occured to me as soon as I saw your post...kinda got in the habit of using preg....forgot all about str...
Posted: Mon Aug 26, 2002 1:24 pm
by nielsene
str* tend to be faster than ereg/preg*. However if you want to check for ";" and ".." as asked, a single call to a regexp function may be faster than two to string functions.
Most sites advise that its better to ensure that the input is valid, instead of testing for various things that make it invalid. Its very easy to miss a naughty character. You almost always need the power of regexp's to do validation.
Posted: Mon Aug 26, 2002 1:32 pm
by Xelmepa
And how can you just possibly check if an input is valid, if not checking if it is invalid?
I am going to check for at least 2 charachters, should I use the preg funtion instead?
Posted: Mon Aug 26, 2002 1:48 pm
by nielsene
As an example of checking validity, lets say we wanted to test that a username contained only alphanumberic,hyphens or underscores and started with a letter. (could be done simpler by using character classes, I know...)
Code: Select all
ereg_match('^їa-zA-Z]ї-a-zA-Z0-9_]*$',$username);
so if any illegal characres are present the match will fail. We don't have to check explictily for ';' or other characters.
Now some people would test for invalidity as
Code: Select all
ereg_match('^.*ї`~!@#$%^&*()=+{};:",<>].*$',$username)
Don't laugh, I've seen people do this and it leads to very buggy code, maybe they left out a few characters, maybe they forgot to escape some (as I did in the example on purpose).
Slightly better would be
Code: Select all
ereg_match('^.*ї^-a-zA-Z0-9_].*$',$username)
This wouldn't enforce the first character must be a letter, but it will match and non alphanumberic, hypen, or underscore. Using '^'s to negate an valid string is almost as strong as only accepting legal. But often it gets very complicated (which is why I didn't write the regexp for matching first chracter must be a letter in this example.)
Of yeah, preg is better than ereg... but I still develop on a very old machine sometimes so I'm used to using ereg on an old php install... If you need more than strpos can do, use preg, not ereg

Posted: Mon Aug 26, 2002 2:09 pm
by Xelmepa
A detail I didn't mention was that I was dealing with a filename. That includes directories some times. So I need the slash and the one fullstop...
Posted: Mon Aug 26, 2002 2:25 pm
by Takuma
^[\.\/A-Za-z0-9]*$
That's it.
Posted: Mon Aug 26, 2002 2:45 pm
by nielsene
Sadly that won't work, that would allow a ".." which the original poster wanted to exclude. Also hypens, underscores, tildes and hashes are allowed in filenames normally. (actually a lot more are I beleive, but you might want to disallow them for portability reasons).
File names are complicated enough that I would probably search on line for a regexp that someone else wrote. (Like the note at php.net, don't write a new email matching regexp, use the example they link to at O'Reilly).
Alternatively, I would use Takuma's first and then test for the bad cases that I know slip through a simple regexp. You will have at least stripped out most of the bad stuff.
As an example I think I would do something like
'^\.?/?([-a-zA-Z0-9_]+/)*.?[-a-zA-Z0-9_]+(.[-a-zA-Z0-9_]+)*'
Not prefect I know, but here's the idea
start with optional a '.' and/or '/' (relatively or absolute paths) then a chain of directories (alphanumeric(plus hyphen/underscore) that's at least one character long) following by the file name. The file name may be preceded by a single dot and then some postive number of characters, following by any number of extensions so long as as each extension has at least one character (to avoid '..' occuring).
While this is getting complicated it does show that it is possible to assume that data is bad until proven clean.
Posted: Mon Aug 26, 2002 3:02 pm
by Takuma
I think this is better though...
^[\.|/]*([a-zA-Z0-9_-]+[/]?)*$
Starts with . or / then followed by alphanumeric characters or undescore or dash. There can be / at the end.
Posted: Mon Aug 26, 2002 3:15 pm
by nielsene
Well that fails for any file with an extension and also for any nested directory/files. That might be a good thing or it might not. It depends on the need of the developer.
Posted: Mon Aug 26, 2002 3:31 pm
by Takuma
I think this will work with nested directoried and to use extesion add on code :-
^[\.|/]*([a-zA-Z0-9_-]+[/]?)*[\.]+A-Za-z0-9]*$
Posted: Tue Aug 27, 2002 4:31 am
by Xelmepa
This is getting weirder/harder than I thought

I'll try Takuma's last option, it looks more complete, though I need someone to explain me the format of this thing cuz I'm not sure I understand why you typed it like this...
Particulary I dont understand why you make it either start with a . or /
Why would a file or directory/file start with . ?