What is 'harmful' HTML?
Moderator: General Moderators
But people could still slip bad stuff into other things.
Like, in a website field, enter: 'http://goodwebsite.com" onclick="window.location=http://badwebsite.com"'
Then, the user thinks they are visiting goodwebsite.com, and it sends them to badwebsite.com. Just one of many examples.
Like, in a website field, enter: 'http://goodwebsite.com" onclick="window.location=http://badwebsite.com"'
Then, the user thinks they are visiting goodwebsite.com, and it sends them to badwebsite.com. Just one of many examples.
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
It is generally easier to use your own formatting markup, such as bbcode to avoid this whole issue.bdlang wrote:Well, you could simply use htmlentities() on all user input / output, but giving some flexibility as to formatting their entries is the goal.mabufo wrote:Couldn't you just ban the use of greater than, and less than? As far as I know that would eliminate any possiblility of something malicious.
The point of this post is to allow some HTML to go through. By eliminating all HTML tags by using htmlentities, would defeat that purpose. But isn't there a regex string that could find all of the html tages by looking for string starting with a "<" and ending with a ">" then running it through a whitelist?
there is strip_tags function. It does exactly that.Bigun wrote:But isn't there a regex string that could find all of the html tages by looking for string starting with a "<" and ending with a ">" then running it through a whitelist?
Reading up on that function it seems that if you set the 'allowable_tags' feature you can specify only certain tags to be allowed.
example:
Perfect... so yeah... we can start whitelisting with ease...
Code: Select all
strip_tags ( string str [, string allowable_tags] )Code: Select all
$string = strip_tags($string, '<a><b><i><u>');not got a parser at hand.. does that disallow any tags that have events?
e.g.:
e.g.:
Code: Select all
<b onmouseover="alert('boo!');">Text..</b>Nothing is harmless.
There is no harmless HTML because all tags can be bound to events, which is what causes problems.
If you are going to allow HTML rather than use a version of BBCode, you will have to filter very carefully all input. This is why BBCode was created in the first place, it's far, far easier.
There is no harmless HTML because all tags can be bound to events, which is what causes problems.
If you are going to allow HTML rather than use a version of BBCode, you will have to filter very carefully all input. This is why BBCode was created in the first place, it's far, far easier.
I agree. There are ways to bypass htmlentities. There are some times that you need to do:
and i think i can remember a list of known xss attack types. Can't remember the source though 
Code: Select all
$doc = str_replace("\xC0\xBC", "<", $doc);- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA
You know, there are only a few tags in TinyMCE that allow event attributes to be added to them through the WYSIWYG feature. You could probably create a list of allowed complete tages in an array, then use that array to scan the posted value of the textarea. If the posted text contains tags that are not exactly to your specification, deny the post.The Ninja Space Goat wrote:I would love to use bbcode, but I like using the wysiwyg editor, tinyMCE. So, I have to allow some html.