Page 1 of 3
Lugubriousness
Posted: Mon Jul 06, 2009 4:47 pm
by kaisellgren
It's so sad when people try to defend themselves without succeeding. Everyday I got 10-20 emails asking to evaluate code, and most of them are insecure. However, while ago I was evaluating a script that was pretty well secured, but I figured out something that is quite misleading (at least to less experienced programmers):
Code: Select all
$filtered = filter_var($_GET['url'], FILTER_SANITIZE_URL);
echo <<<HTML
<html>
<head>
<meta http-equiv="refresh" content="1;url=$filtered">
</head>
<body>
<h1>You are being redirected...</h1>
</body>
</html>
HTML;
There was piece of code similar to that, and I found it a bit bemusing how much the filter library sucks. One might think the code above is secure (what's the point of the URL filter if it wouldn't filter non-URLs?), but it's no where near secure. I found many ways to circumvent it the easiest way probably being a simple encoding process:
Code: Select all
$_GET['url'] = 'data:text/html;base64,PHNjcmlwdD5hbGVydCgnSXRcJ3MgbWUsIEthaSA6KScpPC9zY3JpcHQ+';
The filter library barely does anything... it filters out basic stuff like invisible characters, but that's pretty much it. It will accept the above piece of content and embrace XSS attacks which I find a bit scary because the purpose of it has not been made very clear in the documentation and that might mislead people which is never a good thing. I mean, PHP offering a library that cleanses non-URLs, but in reality it barely does. I have never looked into the library, I have always dealt with problems myself, until now I decided to have a look at it... and it seems to me that it was made by a non-security conscious person.
Seriously. The documentation for filter sucks, too. Maybe I should write a new library for PHP.

Re: Lugubriousness
Posted: Mon Jul 06, 2009 5:48 pm
by Christopher
kaisellgren wrote:Seriously. The documentation for filter sucks, too. Maybe I should write a new library for PHP.

Or better, what are the best practices and the combinations of escaping and filtering necessary to be more secure.
Re: Lugubriousness
Posted: Tue Jul 07, 2009 2:25 am
by matthijs
kaisellgren wrote:Seriously. The documentation for filter sucks, too. Maybe I should write a new library for PHP.
hmmm, what sucks more, the filter library itself or the documentation? That's quite a difference.
I certainly agree the documentation could be improved a lot. I do like the fact that PHP offers native filtering functions for common things. But of course the lack of good documentation prevents me from using the filters fully
arborint wrote:Or better, what are the best practices and the combinations of escaping and filtering necessary to be more secure.
Agree! There is tons of information out there, but most of it is either very bad or lacking in debt. There are some questions about filtering and escaping that I have had for years now, and I still can't find good answers for them.
The point with filtering and escaping is that it only gets "interesting" when you look at problems more in debt. I think someone authoritative on the subject could easily fill a complete book only with explanations and examples of filtering and escaping of data
Re: Lugubriousness
Posted: Tue Jul 07, 2009 5:11 am
by kaisellgren
matthijs wrote:There are some questions about filtering and escaping that I have had for years now, and I still can't find good answers for them.
Strike up a conversation then.
Re: Lugubriousness
Posted: Tue Jul 07, 2009 5:36 am
by onion2k
I'm failing to understand the problem. "data:text/html;base64,PHNjcmlwdD5hbGVydCgnSXRcJ3MgbWUsIEthaSA6KScpPC9zY3JpcHQ+" is a well formed, valid URL. Why would you expect that to be rejected?
filter_var() is a function for ensuring data matches a specified format. That's all it does. If there is a security implication if you allow certain valid URLs (like in this case) you need to do more checking. Eg, checking the protocol is allowed (an http or https URL for example).
Re: Lugubriousness
Posted: Tue Jul 07, 2009 5:43 am
by kaisellgren
There are no problems, it's just that it's a bit unclear to many people.
"data:text/html;base64,PHNjcmlwdD5hbGVydCgnSXRcJ3MgbWUsIEthaSA6KScpPC9zY3JpcHQ+"
Is not a valid URL, it should be form of resource_type://domain:port/filepathname?query_string#anchor
I don't really mind what the function does, but it should be more clear what it does. All it does it checks for valid URL characters.
Re: Lugubriousness
Posted: Tue Jul 07, 2009 5:51 am
by onion2k
kaisellgren wrote:Is not a valid URL, it should be form of resource_type://domain:port/filepathname?query_string#anchor
I'm afraid you're wrong. Internet protocols usually fit that format, but
URLs (which is what the filter is checking for) don't have to. In the case of the example you case, the URL is in the "data" format which is not an internet protocol. From the spec:
Those schemes which refer to internet protocols mostly have a common syntax for the rest of the object name. This starts with a double slash "//" to indicate its presence, and continues until the following slash "/".
http://www.w3.org/Addressing/URL/url-spec.txt
Re: Lugubriousness
Posted: Tue Jul 07, 2009 6:04 am
by kaisellgren
onion2k wrote:In the case of the example you case, the URL is in the "data" format which is not an internet protocol.
That's exactly the "problem". The reason why people use the filter library is to strengthen their security. That is what ultimately should happen (whether it's strictly a standard URL or not).
This is what Wikipedia says:
Every URL is made up of some combination of the following: the scheme name, followed by a colon, then, depending on scheme, a hostname (alternatively, IP address), a port number, the pathname of the file to be fetched or the program to be run, then a query string[4][5], and with HTML files, an anchor (optional) for where the page should start to be displayed.[6]
The combined syntax looks like:
resource_type://domain:port/filepathname?query_string#anchor
Based on the documentation, there are only 2 sentences explaining the meaning of the given tools:
Validation is used to validate or check if the data meets certain qualifications. For example, passing in FILTER_VALIDATE_EMAIL will determine if the data is a valid email address, but will not change the data itself.
Sanitization will sanitize the data, so it may alter it by removing undesired characters. For example, passing in FILTER_SANITIZE_EMAIL will remove characters that are inappropriate for an email address to contain. That said, it does not validate the data.
That's pretty clear to me. Sanitization filters out non-valid characters whereas validation checks if the data meets the certain qualifications. The problem is that this is not enough. Every tenth developer misuses the library and it should be more clear how to use Filter properly. It's not clear what Filter does and what it is best for and there are barely any examples in the documentation at all.
If you use FILTER_VALIDATE_URL, then "data:text/html;base64,PHNjcmlwdD5hbGVydCgnSXRcJ3MgbWUsIEthaSA6KScpPC9zY3JpcHQ+" is rejected. So, Filter library thinks it is not a valid URL.
Re: Lugubriousness
Posted: Tue Jul 07, 2009 6:32 am
by onion2k
Right. I see your point. Yeah, I think I'd expect the "sanitize" filter to fail if the string failed against the "validate" filter. You shouldn't be able to sanitize an invalid string.
Re: Lugubriousness
Posted: Tue Jul 07, 2009 6:53 am
by kaisellgren
Here's another piece of code I evaluated three weeks ago:
Code: Select all
$email = "'onmouseover='alert('Kai was here')";
$email = filter_var($email, FILTER_SANITIZE_EMAIL);
echo <<<HTML
<html>
<head>
</head>
<body>
<a href='mailto:$email'>Email him</a>
</body>
</html>
HTML;
The developer simply believed since he is using a "standard" security library that bundles in PHP, he would be safe. In reality though, many kinds of XSS attempts will succeed. It's very unclear (to developers in general) where one could possibly use the Filter library. In the above example, using VALIDATE would return false, but I remember there was a way to circumvent that, too. The documentation could at least try to explain something. A security library should explain what it is capable of doing.
Re: Lugubriousness
Posted: Tue Jul 07, 2009 7:31 am
by onion2k
kaisellgren wrote:The developer simply believed since he is using a "standard" security library that bundles in PHP, he would be safe.
To be fair though, the developer was using the Filter library incorrectly. He was assuming too much (that the sanitize filter also validated things). No "security library" will protect you if you don't take the time to learn how to use it properly.
Ultimately the lesson to take away from this is that you need to
test for security issues
regardless of your code.
Re: Lugubriousness
Posted: Tue Jul 07, 2009 8:04 am
by kaisellgren
onion2k wrote:No "security library" will protect you if you don't take the time to learn how to use it properly.
Indeed. If you filter something, it is filtered for a specific context. In case of Filter, it is not entirely clear for which kind of purposes you may use them. You need to test and find out and the documentation is not going to help you. They just throw different filters at you without giving any kinds of examples or tips (besides the basic email filter example). That misleads many people. It's lugubrious. I have to constantly tell people that their Filter "protected" code is vulnerable to something (XSS, double encoding, response-splitting, etc).
Re: Lugubriousness
Posted: Tue Jul 07, 2009 1:07 pm
by Christopher
kaisellgren wrote:onion2k wrote:No "security library" will protect you if you don't take the time to learn how to use it properly.
Indeed. If you filter something, it is filtered for a specific context. In case of Filter, it is not entirely clear for which kind of purposes you may use them. You need to test and find out and the documentation is not going to help you. They just throw different filters at you without giving any kinds of examples or tips (besides the basic email filter example). That misleads many people. It's lugubrious. I have to constantly tell people that their Filter "protected" code is vulnerable to something (XSS, double encoding, response-splitting, etc).
It is fine to say that no security library will well if you don't use it correctly, or that you have to do different things in different contexts. I think everyone agrees with those sentiments.
As far as not using libraries properly, the point of the OP if this thread was that the design of the library can effect how easily and often it will be used properly. Kai seems to indicate that the PHP filter extension does not lead one to do the right thing.
If as onion says security libraries will not work well unless learned properly, and as Kai says the libraries' designs and documentation do not make it clear how to use them properly, then the solution seem to be to create a clearer design and documentation. However an immediate red flag goes up for me when I hear the word "clear". Sometimes (not always) what is clear to one programmer in not clear at all to another. So I think we would first need to see some examples of unclear to see why people are confused by them.
But more specific to the security discussion, I think the thing that is often left out is that the number of "contexts" commonly confronted is not infinite. I think there is a fairly small set of common "contexts", user supplied URLs that will be displayed are an example of one. I think we could probably build a nice list of common things that need to be dealt with in slightly different ways, such as: dates, people/place names, file names, text without HTML, text with HTML, etc. From there, best security practices could be described -- with examples and code samples.
Re: Lugubriousness
Posted: Tue Jul 07, 2009 1:27 pm
by kaisellgren
Man, that's what I've been planning (get out of my head

). Creating a library that comes with real life code examples that tell you exactly how, when and where to use what. From credit cards to allowance of CSS, everything even slightly general would be handled by a single library. Something similar to ESAPI perhaps.
Re: Lugubriousness
Posted: Tue Jul 07, 2009 2:34 pm
by Weirdan
onion2k wrote:He was assuming too much (that the sanitize filter also validated things).
I guess I'm not following... in my book output of sanitize_blah() is supposed to pass validate_blah() successfully - that is, produce valid blah. If, given the input, valid blah cannot be produced, it should either throw an exception or return some error code. Otherwise it's implemented incorrectly.
Not that sanitize_* has much use to me — I believe incorrect input should be rejected and not attempted to deal with (GIGO principle will bite you sooner or later, and I don't like to be bitten by that GO part).