Avoiding HTML code to ruin your site
Moderator: General Moderators
Avoiding HTML code to ruin your site
I have made a site where users can post HTML tags and use it to display pics and links
I want to make sure they dont misuse it...is there any sort of code or technique i can use to avoid this
please help
I want to make sure they dont misuse it...is there any sort of code or technique i can use to avoid this
please help
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
This subject was raised a few weeks ago (can't think of the thread at the moment) and what it boils down to is that allowing HTML to be added to your site without some seriously careful cleansing is very risky. The hard part is knowing all of the attributes of each HTML tag. Like inserting an image with an onMouseover attribute. You can actually tie in malicious JavaScript to the onMouseover event of that tag. That is bad stuff.
- daedalus__
- DevNet Resident
- Posts: 1925
- Joined: Thu Feb 09, 2006 4:52 pm
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
Yes, I agree with the whitelist idea. It will take much less time to determine the allowable tags than the disallowable ones. For eample, you can use a regular expression to match everything within these tags:
With a regex something like this (might not be perfect, I'm not great at regular expressions):
Code: Select all
<a></a>
<p></p>
<b></b>
<i></i>
<strong></strong>
<em></em>Code: Select all
[<][a][p][b][i][strong][em][>].*[</][a][p][b][i][strong][em][>]I wrote this in a moment of regex madness. It removes all tags and attributes that are not allowed. Tries to find any hidden javascript too.
Code: Select all
function CleanUp($input)
{
// list of allowed tags
define('__HTML__', 'a|b|br|i|img|p|span');
// list of allowed attributes
define('__ATTRIBUTES__', 'src|alt|href|title|class|id');
if(!function_exists('DisallowedTagsCallback'))
{
function DisallowedTagsCallback($input)
{
$input[0] = strip_tags($input[0]);
return htmlentities($input[0]);
}
}
if(!function_exists('DisallowedAttributesCallback'))
{
function DisallowedAttributesCallback($input)
{
$regex = '/\s*\b(?!(?:'.__ATTRIBUTES__.'))[a-z]+\b\s*[=]\s*([\'"])'.
'(((?!\1).)|((?<=[\\\])\1))*\1/is';
return preg_replace($regex, '', $input[0]);
}
}
// strip any javascript
$regex = array('/\<script\b[^>]*\>((?!\<\/script\b[^>]*\>).)*\<\/script\b[^>]*\>/is',
'/\s*on[a-z]+\s*[=]\s*(["\'])(((?!\1).)|((?<=[\\\])\1))*\1/i',
'/href+\s*[=]\s*(["\'])((?!\1).)*javascript((?!\1).)*\1/is');
$replace = array('', '', 'href="#"');
$input = preg_replace($regex, $replace, $input);
// strip disallowed tags
$regex= " @(((?<=^)|(?<=[>]))(?![<]/?(".__HTML__.")\b[^>]*[>])".
"([^<]|((?![<]/?(".__HTML__.")\b[^>]*[>])[<]))+)@i";
$input = preg_replace_callback($regex, 'DisallowedTagsCallback', $input);
// strip disallowed attributes
$regex = '/(?<=[<])[^>]+(?=[>])/';
return preg_replace_callback($regex, 'DisallowedAttributesCallback', $input);
}