Filtering form input

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

User avatar
Xelmepa
Forum Commoner
Posts: 41
Joined: Sat Aug 24, 2002 3:02 pm
Location: Athens, Greece
Contact:

Filtering form input

Post by Xelmepa »

I'm sure I've seen this, probably around here, but I can't find it again...
I need to find if a string contains a specific charachter, for security issues, filtering user input.

So for example I want to find if a string contains the charachter ";" or "..".
Anyone?
User avatar
phpPete
Forum Commoner
Posts: 97
Joined: Sun Aug 18, 2002 4:40 pm
Location: New Jersey

Post by phpPete »

Code: Select all

$target = ";";
$string = "I am an invalid ;  string";
if(preg_match("/$target/", $string))
{
    echo "Warning!! Warning!!  The sky is falling!!!!";
}
relevant PHP manual page: http://www.php.net/manual/en/function.preg-match.php
Last edited by phpPete on Mon Aug 26, 2002 1:17 pm, edited 1 time in total.
User avatar
hob_goblin
Forum Regular
Posts: 978
Joined: Sun Apr 28, 2002 9:53 pm
Contact:

Post by hob_goblin »

Code: Select all

if(strpos($mystring, ';')){ echo "a ';' was found"; }
User avatar
hob_goblin
Forum Regular
Posts: 978
Joined: Sun Apr 28, 2002 9:53 pm
Contact:

Post by hob_goblin »

i am pretty sure strpos, and strstr are faster.
User avatar
phpPete
Forum Commoner
Posts: 97
Joined: Sun Aug 18, 2002 4:40 pm
Location: New Jersey

Post by phpPete »

They are, and that occured to me as soon as I saw your post...kinda got in the habit of using preg....forgot all about str...
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

str* tend to be faster than ereg/preg*. However if you want to check for ";" and ".." as asked, a single call to a regexp function may be faster than two to string functions.

Most sites advise that its better to ensure that the input is valid, instead of testing for various things that make it invalid. Its very easy to miss a naughty character. You almost always need the power of regexp's to do validation.
User avatar
Xelmepa
Forum Commoner
Posts: 41
Joined: Sat Aug 24, 2002 3:02 pm
Location: Athens, Greece
Contact:

Post by Xelmepa »

And how can you just possibly check if an input is valid, if not checking if it is invalid?

I am going to check for at least 2 charachters, should I use the preg funtion instead?
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

As an example of checking validity, lets say we wanted to test that a username contained only alphanumberic,hyphens or underscores and started with a letter. (could be done simpler by using character classes, I know...)

Code: Select all

ereg_match('^їa-zA-Z]ї-a-zA-Z0-9_]*$',$username);
so if any illegal characres are present the match will fail. We don't have to check explictily for ';' or other characters.

Now some people would test for invalidity as

Code: Select all

ereg_match('^.*&#1111;`~!@#$%^&*()=+&#123;&#125;;:",<>].*$',$username)
Don't laugh, I've seen people do this and it leads to very buggy code, maybe they left out a few characters, maybe they forgot to escape some (as I did in the example on purpose).

Slightly better would be

Code: Select all

ereg_match('^.*&#1111;^-a-zA-Z0-9_].*$',$username)
This wouldn't enforce the first character must be a letter, but it will match and non alphanumberic, hypen, or underscore. Using '^'s to negate an valid string is almost as strong as only accepting legal. But often it gets very complicated (which is why I didn't write the regexp for matching first chracter must be a letter in this example.)

Of yeah, preg is better than ereg... but I still develop on a very old machine sometimes so I'm used to using ereg on an old php install... If you need more than strpos can do, use preg, not ereg :)
User avatar
Xelmepa
Forum Commoner
Posts: 41
Joined: Sat Aug 24, 2002 3:02 pm
Location: Athens, Greece
Contact:

Post by Xelmepa »

A detail I didn't mention was that I was dealing with a filename. That includes directories some times. So I need the slash and the one fullstop...
User avatar
Takuma
Forum Regular
Posts: 931
Joined: Sun Aug 04, 2002 10:24 am
Location: UK
Contact:

Post by Takuma »

^[\.\/A-Za-z0-9]*$

That's it.
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

Sadly that won't work, that would allow a ".." which the original poster wanted to exclude. Also hypens, underscores, tildes and hashes are allowed in filenames normally. (actually a lot more are I beleive, but you might want to disallow them for portability reasons).

File names are complicated enough that I would probably search on line for a regexp that someone else wrote. (Like the note at php.net, don't write a new email matching regexp, use the example they link to at O'Reilly).

Alternatively, I would use Takuma's first and then test for the bad cases that I know slip through a simple regexp. You will have at least stripped out most of the bad stuff.

As an example I think I would do something like
'^\.?/?([-a-zA-Z0-9_]+/)*.?[-a-zA-Z0-9_]+(.[-a-zA-Z0-9_]+)*'

Not prefect I know, but here's the idea
start with optional a '.' and/or '/' (relatively or absolute paths) then a chain of directories (alphanumeric(plus hyphen/underscore) that's at least one character long) following by the file name. The file name may be preceded by a single dot and then some postive number of characters, following by any number of extensions so long as as each extension has at least one character (to avoid '..' occuring).

While this is getting complicated it does show that it is possible to assume that data is bad until proven clean.
User avatar
Takuma
Forum Regular
Posts: 931
Joined: Sun Aug 04, 2002 10:24 am
Location: UK
Contact:

Post by Takuma »

I think this is better though...

^[\.|/]*([a-zA-Z0-9_-]+[/]?)*$

Starts with . or / then followed by alphanumeric characters or undescore or dash. There can be / at the end.
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

Well that fails for any file with an extension and also for any nested directory/files. That might be a good thing or it might not. It depends on the need of the developer.
User avatar
Takuma
Forum Regular
Posts: 931
Joined: Sun Aug 04, 2002 10:24 am
Location: UK
Contact:

Post by Takuma »

I think this will work with nested directoried and to use extesion add on code :-

^[\.|/]*([a-zA-Z0-9_-]+[/]?)*[\.]+A-Za-z0-9]*$
User avatar
Xelmepa
Forum Commoner
Posts: 41
Joined: Sat Aug 24, 2002 3:02 pm
Location: Athens, Greece
Contact:

Post by Xelmepa »

This is getting weirder/harder than I thought :P
I'll try Takuma's last option, it looks more complete, though I need someone to explain me the format of this thing cuz I'm not sure I understand why you typed it like this...

Particulary I dont understand why you make it either start with a . or /
Why would a file or directory/file start with . ?
Post Reply