What is 'harmful' HTML?

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

User avatar
mabufo
Forum Commoner
Posts: 81
Joined: Thu Jul 10, 2003 11:11 pm
Location: Orland Park, IL
Contact:

Post by mabufo »

Couldn't you just ban the use of greater than, and less than? As far as I know that would eliminate any possiblility of something malicious.
User avatar
Nathaniel
Forum Contributor
Posts: 396
Joined: Wed Aug 31, 2005 5:58 pm
Location: Arkansas, USA

Post by Nathaniel »

But people could still slip bad stuff into other things.

Like, in a website field, enter: 'http://goodwebsite.com" onclick="window.location=http://badwebsite.com"'

Then, the user thinks they are visiting goodwebsite.com, and it sends them to badwebsite.com. Just one of many examples.
bdlang
Forum Contributor
Posts: 395
Joined: Tue May 16, 2006 8:46 pm
Location: Ventura, CA US

Post by bdlang »

mabufo wrote:Couldn't you just ban the use of greater than, and less than? As far as I know that would eliminate any possiblility of something malicious.
Well, you could simply use htmlentities() on all user input / output, but giving some flexibility as to formatting their entries is the goal.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

bdlang wrote:
mabufo wrote:Couldn't you just ban the use of greater than, and less than? As far as I know that would eliminate any possiblility of something malicious.
Well, you could simply use htmlentities() on all user input / output, but giving some flexibility as to formatting their entries is the goal.
It is generally easier to use your own formatting markup, such as bbcode to avoid this whole issue.
Bigun
Forum Contributor
Posts: 237
Joined: Tue Jun 13, 2006 10:50 am

Post by Bigun »

The point of this post is to allow some HTML to go through. By eliminating all HTML tags by using htmlentities, would defeat that purpose. But isn't there a regex string that could find all of the html tages by looking for string starting with a "<" and ending with a ">" then running it through a whitelist?
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

Bigun wrote:But isn't there a regex string that could find all of the html tages by looking for string starting with a "<" and ending with a ">" then running it through a whitelist?
there is strip_tags function. It does exactly that.
Bigun
Forum Contributor
Posts: 237
Joined: Tue Jun 13, 2006 10:50 am

Post by Bigun »

Reading up on that function it seems that if you set the 'allowable_tags' feature you can specify only certain tags to be allowed.

Code: Select all

strip_tags ( string str [, string allowable_tags] )
example:

Code: Select all

$string = strip_tags($string, '<a><b><i><u>');
Perfect... so yeah... we can start whitelisting with ease...
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Post by Jenk »

not got a parser at hand.. does that disallow any tags that have events?

e.g.:

Code: Select all

<b onmouseover="alert('boo!');">Text..</b>
Bigun
Forum Contributor
Posts: 237
Joined: Tue Jun 13, 2006 10:50 am

Post by Bigun »

Are tables harmless?
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Post by Jenk »

Nothing is harmless.

There is no harmless HTML because all tags can be bound to events, which is what causes problems.

If you are going to allow HTML rather than use a version of BBCode, you will have to filter very carefully all input. This is why BBCode was created in the first place, it's far, far easier.
basdog22
Forum Contributor
Posts: 158
Joined: Sun Nov 30, 2003 3:03 pm
Location: Greece

Post by basdog22 »

I agree. There are ways to bypass htmlentities. There are some times that you need to do:

Code: Select all

$doc = str_replace("\xC0\xBC", "<", $doc);
and i think i can remember a list of known xss attack types. Can't remember the source though :roll:
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Post by Luke »

I would love to use bbcode, but I like using the wysiwyg editor, tinyMCE. So, I have to allow some html.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Check over at phpclasses for some scripts, I recall seeing one that allowed you to defined a whitelist of tags along with appropriate attributes and such. Wouldn't be that hard to write one yourself either..
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

The Ninja Space Goat wrote:I would love to use bbcode, but I like using the wysiwyg editor, tinyMCE. So, I have to allow some html.
You know, there are only a few tags in TinyMCE that allow event attributes to be added to them through the WYSIWYG feature. You could probably create a list of allowed complete tages in an array, then use that array to scan the posted value of the textarea. If the posted text contains tags that are not exactly to your specification, deny the post.
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Post by Luke »

What if user disables javascript and just enters whatever they want in the text box?

EDIT: Scratch that question... I misread your post.
Post Reply