Page 1 of 1
Filtering contact form input
Posted: Sun Oct 21, 2007 11:00 pm
by gr8dane
Processing forms was so much easier before I started worrying about security!! I'm seeing a lot of warnings that I should filter input, but not a lot of details about how to do that. Since I don't know what the "bad" input might be, I'm not sure how to filter it out.
For example, what is the best regex pattern to use to validate legitimate names without causing security problems in an email form? When possible, I want to allow perfectly legitimate names such as "O'Reilly", "Mary & Joseph", "Mary Smith-Jones", "Tom Jones, Jr." or 'Tom ("Bud") Jones'. I'm thinking of using /^[a-zA-Z\'\ \&\-\,\.\"\(\)]+$/ to validate the name. Would this cause any security issues when using the name in mail()?
Also, what kinds of input should I be filtering out of the message? Since I don't know what the expected input would be in that field, I would need to know what not to allow. Any good regex for that?
Posted: Mon Oct 22, 2007 12:30 am
by Kieran Huggins
I've heard there's something called
HTML Purifier.... might be what you're looking for?
Posted: Mon Oct 22, 2007 1:31 am
by gr8dane
I have no idea! It looks to me like it validates HTML, which is not what I'm looking for. Their website doesn't really make it clear (at least not that I could find) what exactly it does or how to use it. Besides, I wasn't really looking for software to do the job for me.
Posted: Mon Oct 22, 2007 8:09 am
by aaronhall
I wouldn't worry about filtering someone's given name... who are you to say what a valid name is?
If you plan on inserting user input into MySQL, sanitize with mysql_real_escape_string(). If you want to output user input onto one of your pages, and that input is not supposed to contain HTML, use htmlspecialchars(). If you want to output HTML, use HTML Purifier.
Posted: Mon Oct 22, 2007 1:03 pm
by gr8dane
aaronhall wrote:I wouldn't worry about filtering someone's given name... who are you to say what a valid name is?
I thought this was the Security forum. Check out Kieran Huggins' post at
viewtopic.php?t=72721 for a great illustration (literally) of why. We can't assume that the input in a name field is actually a name, and not malicious code. I've at least learned that much, so far.
If you plan on inserting user input into MySQL, sanitize with mysql_real_escape_string(). If you want to output user input onto one of your pages, and that input is not supposed to contain HTML, use htmlspecialchars().
My question was actually about using the input in an email, where HTML entities don't get translated.
Posted: Mon Oct 22, 2007 1:27 pm
by Christopher
You can validate or you can filter:
Code: Select all
$name = preg_replace('/[ ^a-zA-Z\'\ \&\-\,\.\"\(\)]/', '', $_POST['name']);
I tend to filter and escape for security and validate for correctness, but you can do it either way.
You
always need to use the appropriate escape function to escape the output.
Posted: Mon Oct 22, 2007 1:43 pm
by gr8dane
Sorry, I guess I said "validate" when I meant "filter" in one spot. I'm not sure what the difference is. In any case, that brings me back to my original question: Would using that regex to filter a name cause any security issues when using the name in mail()? In other words, am I being too generous in what I allow? And what is "the appropriate escape function" when input is being emailed?
Posted: Mon Oct 22, 2007 1:52 pm
by Christopher
The filter I showed above is a character whitelist. That is one thing to do and it can reduce the amount of validation you need to follow up with. You still want to validate that text you are getting is in the format expected. Escaping depends on what it is and where it is going. You would not want to escape email addresses, for example, because they must contain all valid characters, but you would escape the subject or body of the email.
Posted: Mon Oct 22, 2007 2:47 pm
by gr8dane
arborint wrote:you would escape the subject or body of the email.
I tried addslashes() on the body, but the slashes showed up in the email. Was I doing something wrong?
Posted: Mon Oct 22, 2007 8:03 pm
by shiflett
I said "validate" when I meant "filter" in one spot. I'm not sure what the difference is.
Validating is a subset of filtering.
To validate is to determine whether something is valid. For example:
Code: Select all
$isValid = ctype_alnum($_POST['username']);
Filtering adds to this by preventing invalid data:
Code: Select all
if (ctype_alnum($_POST['username'])) {
/* Continue */
} else {
/* Abort */
}
Hope that helps.
Posted: Wed Nov 07, 2007 3:21 pm
by alpha2zee
Might be relevant
htmLawed, a highly customizable, 45 kb, single file, non-OOP PHP script to filter and purify HTML. Besides restricting tags/elements, attributes and URL protocols as per one's specification, and balancing HTML tags and ensuring valid tag nesting/well-formedness, it also has good anti-XSS and anti-spam measures.