onion2k wrote:In the case of the example you case, the URL is in the "data" format which is not an internet protocol.
That's exactly the "problem". The reason why people use the filter library is to strengthen their security. That is what ultimately should happen (whether it's strictly a standard URL or not).
This is what Wikipedia says:
Every URL is made up of some combination of the following: the scheme name, followed by a colon, then, depending on scheme, a hostname (alternatively, IP address), a port number, the pathname of the file to be fetched or the program to be run, then a query string[4][5], and with HTML files, an anchor (optional) for where the page should start to be displayed.[6]
The combined syntax looks like:
resource_type://domain:port/filepathname?query_string#anchor
Based on the documentation, there are only 2 sentences explaining the meaning of the given tools:
Validation is used to validate or check if the data meets certain qualifications. For example, passing in FILTER_VALIDATE_EMAIL will determine if the data is a valid email address, but will not change the data itself.
Sanitization will sanitize the data, so it may alter it by removing undesired characters. For example, passing in FILTER_SANITIZE_EMAIL will remove characters that are inappropriate for an email address to contain. That said, it does not validate the data.
That's pretty clear to me. Sanitization filters out non-valid characters whereas validation checks if the data meets the certain qualifications. The problem is that this is not enough. Every tenth developer misuses the library and it should be more clear how to use Filter properly. It's not clear what Filter does and what it is best for and there are barely any examples in the documentation at all.
If you use FILTER_VALIDATE_URL, then "data:text/html;base64,PHNjcmlwdD5hbGVydCgnSXRcJ3MgbWUsIEthaSA6KScpPC9zY3JpcHQ+" is rejected. So, Filter library thinks it is not a valid URL.