Also, of course I use htmlentities when outputting data back to html, so I'm reasonable safe in that aspect. It's just that from a usibility viewpoint, I would like to warn people who use html that it is not allowed. For example, maybe some people would assume some tags like <b> can be used. Then, when they view their submitted entry, they see a) their tags stripped or b) htmlentitied code
Feyd showed some regex with which to strip tags:
Code: Select all
function megaStripTags($source)
{
$p = array(
'#<\s*(style|script)[^>]*>.*?<\s*/\s*\\1[^>]*>#si' => ' ',
// convert <script> and <style> containers to a single space
'#<(?:\s*/)?\s*[a-z]+(\s*[a-z]+\s*=\s*(["\']?)(.*?)\\2)*[^>]*>#si' => ' ',
// convert all remaining tags to a space
'# #i' => ' ',
// convert to a space
'/&#(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]);/e' => 'chr(intval("\\1"))',
// convert �-255 into the character literal
'/&/' => '&',
// convert & entity into the literal &
);
return preg_replace(array_keys($p),array_values($p),$source);
}
$test = '<TD WIDTH="14%" BACKGROUND="images.jpg">
<A HREF="http://something.xxx">
<IMG SRC="image.gif" BORDER="0" ONLOAD="if (this.width>50) this.border=1"
ALT="Preview by Thumbshots"
WIDTH="45">testestets>blah</A></TD>';
var_dump(strip_tags($test),megaStripTags($test));I came up with this:
Code: Select all
<?php
function dump($array) {
echo '<pre>';
print_r($array);
echo '</pre>';
}
$p = array(
'#<\s*(style|script)[^>]*>.*?<\s*/\s*\\1[^>]*>#si' ,
'#<(?:\s*/)?\s*[a-z]+(\s*[a-z]+\s*=\s*(["\']?)(.*?)\\2)*[^>]*>#si' ,
'# #i' ,
'/&#(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]);/e' ,
'/&/' ,
);
$test = '<TD WIDTH="14%" BACKGROUND="images.jpg">
<A HREF="http://something.xxx"><IMG SRC="image.gif" BORDER="0"
ONLOAD="if (this.width>50) this.border=1" ALT="Preview by Thumbshots"
WIDTH="45">testestets>blah</A></TD>';
//$test = '<b>test'; // result: HTML found
//$test = '<script> this'; // result: HTML found
//$test = '<a href="somelinke">some</a>'; // result: HTML found
//$test = '<h 2>'; // result HTML found
//$test = 'And this is > then this or < then that'; // no HTML found
$test = 'And this is < then this or > then that'; // HTML found
echo 'The teststring is: ' . htmlentities($test) . '<br>';
foreach ( $p as $value ) {
if(preg_match($value,$test,$matches))
{
foreach($matches as $value)
{
echo '<br>HTML found: <br>';
dump(htmlentities($value));
}
}
}
?>And, does anyone know of other regexes which I can use?