Disallow htmltags

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Terriator
Forum Commoner
Posts: 60
Joined: Mon Jul 04, 2005 12:46 pm

Disallow htmltags

Post by Terriator »

Hey,

I currently have a kind of a "shoutbox"-function based on a mysql-table, but it oftenly get messed up when users type in html-codes, so I was wondering, how do I disallow users to upload any kind of "codes" into the database???

Thanks,
Mathias
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

You can use strip_tags and/or htmlentities depending on the exact situation

Code: Select all

<?php
$newstring = strip_tags($string);
echo htmlentities($newstring, ENT_QUOTES, 'UTF-8');
?>
Strip_tags strips all html tags completely, however does not encode ampersands. If you would only use htmlentities, characters like < and > etc are encoded so they are no longer interpreted as html. Check the manual for a better explanation and/or the finer details.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

something of note: strip_tags() is fairly, shall we say, dumb in how it strips tags, as noted in the first link I posted.

Although since writing that, I have written a newer version of a smarter strip_tags():

Code: Select all

function megaStripTags($source)
{
	$p = array(
		'#<\s*(style|script)[^>]*>.*?<\s*/\s*\\1[^>]*>#si' 						=> ' ',							//	convert <script> and <style> containers to a single space
		'#<(?:\s*/)?\s*[^>]+([a-z]+\s*=\s*(["\']?)([^>]*?)\\2)*[^>]*>#si' 	=> ' ',							//	convert all remaining tags to a space
		'#&nbsp;#i' 																			=> ' ',							//	convert &nbsp; to a space
		'/&#(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]);/e' 				=> 'chr(intval("\\1"))',	//	convert &#0-255 into the character literal
		'/&/' 																				=> '&',							//	convert & entity into the literal &
	);
	return preg_replace(array_keys($p),array_values($p),$source);
}
Although this functions does a bit more than just strip tags, you can comment out the various lines depending on what you want to remove or not.
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Feyd, could you explain what the megaStripTags does compared to the regular strip_tags? The regex is - yet- quite hard to understand for me. I read the other thread and made a simple test case, and so far yours seems a bit more intelligent.

Code: Select all

$string = "Let's test this <script and</h2> see"

$newstring = strip_tags($string); // returns Let's test this
$newstring2 = megaStripTags($string); // returns Let's test this see
Obviously from a usability viewpoint the second result is preferable. But is the megaStripTags not too liberal? How could I test if it's safe to use?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Since I didn't do extensive testing, my code was slightly errored. Here's a decent test though (test string from the first link I posted):

Code: Select all

<?php

function megaStripTags($source)
{
	$p = array(
		'#<\s*(style|script)[^>]*>.*?<\s*/\s*\\1[^>]*>#si'					=> ' ',						//    convert <script> and <style> containers to a single space
		'#<(?:\s*/)?\s*[a-z]+(\s*[a-z]+\s*=\s*(["\']?)(.*?)\\2)*[^>]*>#si'	=> ' ',						//    convert all remaining tags to a space
		'#&nbsp;#i'															=> ' ',						//    convert &nbsp; to a space
		'/&#(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]);/e'			=> 'chr(intval("\\1"))',	//    convert &#0-255 into the character literal
		'/&/'															=> '&',						//    convert & entity into the literal &
	);
	return preg_replace(array_keys($p),array_values($p),$source);
}

$test = '<TD WIDTH="14%" BACKGROUND="images.jpg"><A HREF="http://something.xxx">
<IMG SRC="image.gif" BORDER="0" ONLOAD="if (this.width>50) this.border=1" ALT="Preview by Thumbshots"
WIDTH="45">testestets>blah</A></TD>';

var_dump(strip_tags($test),megaStripTags($test));

?>
outputs

Code: Select all

string(76) "
50) this.border=1" ALT="Preview by Thumbshots"
WIDTH="45">testestets>blah"
string(22) "
 testestets>blah  "
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

That's pretty cool. Thanks.

I just asked because I'm always a bit suspicious about any code nowadays. Certainly when I don't understand the regex completely :)
Post Reply