Page 1 of 3

What is 'harmful' HTML?

Posted: Mon Jul 03, 2006 1:10 pm
by Bigun
Safe HTML:

Code: Select all

<b></b>
<br>
<i></i>
<font></font>
*EDIT* -- Putting List Above Here
Suffice it to say, I don't know the entire body of the HTML language.

But is there a list of unharmful HTML that can be allowed?

examples:

Code: Select all

<b></b>
<font></font>
<br>
A section of my code will allow some HTML to go through for customization, but I need to know what can become potentially harmful.

Posted: Mon Jul 03, 2006 4:32 pm
by shiznatix

Code: Select all

<iframe src="www.badsite9000.com"></iframe>
<a href="www.killyourcomputer.com">w000t</a>
<img src="www.miningtroll.com" />
...not to mention javascript

Posted: Mon Jul 03, 2006 4:48 pm
by MrPotatoes
i've never done security so i don't know how i would stop something like that. how would i stop that?

Posted: Mon Jul 03, 2006 5:07 pm
by Bigun
shiznatix wrote:

Code: Select all

<iframe src="www.badsite9000.com"></iframe>
<a href="www.killyourcomputer.com">w000t</a>
<img src="www.miningtroll.com" />
...not to mention javascript
I'd like to allow href and img, people will be linking to other sites and posting images...

But yeah, block javascript and iframe...

Any other harmful HTML?

Posted: Mon Jul 03, 2006 7:01 pm
by Luke
this is something I'd be very interested in as well... I'm building an article management app, and I want to allow them to post just about anything they want, without allowing them the ability to destroy the website or server.

Posted: Mon Jul 03, 2006 7:33 pm
by Bigun
It doesn't seem anyone has a list...

So perhaps we are breaking new ground with this?

If so, which would be quicker... listing and allowing safe HTML?

Or filtering out bad HTML?

Posted: Mon Jul 03, 2006 8:11 pm
by hawleyjr
Bigun wrote:It doesn't seem anyone has a list...

So perhaps we are breaking new ground with this?

If so, which would be quicker... listing and allowing safe HTML?

Or filtering out bad HTML?
Always list the good. you never know what could be bad :lol: :lol:

"The enemy you know is much better than the enemy you don't"

Posted: Tue Jul 04, 2006 4:02 am
by Jenk
dynamic variable images can be just as harmful.

First thing to filter would be the javascript.. don't allow any of it.

Code: Select all

<?php

if (preg_match('/<(([^<>]*?on[^"\'>=]{0,8}[^>]+)|(script?[^="\'>]+))>/i', $input)) {
    die('Dirty javascript, out out out!');
}

?>

Posted: Tue Jul 04, 2006 8:07 am
by Bigun
Perhaps I can start a list of unharmful html, can-be harmful html, and harmful html and ways to filter the last two.

*EDIT*

Can someone remove that dirty lil' programmer tag from my name, I'm nowhere near close to the skill level required to be called that.

Posted: Tue Jul 04, 2006 8:21 am
by Roja
Bigun wrote:Can someone remove that dirty lil' programmer tag from my name, I'm nowhere near close to the skill level required to be called that.
Its automatic based on the number of posts you've done in the forums here.

Posted: Tue Jul 04, 2006 10:44 am
by RobertGonzalez
I TRIED TO POST THIS YESTERDAY BUT THE DEVNET SERVER WAS NOT RESPONDING. IT WAS OPEN IN MY WINDOW SO I AM POSTING NOW.
<iframe>, <object> and <embed> and anything else that can potentially reach out and grab content from another site and execute it on yours. This is a very common way hackers can take over your site for the purpose of spreading malice. Keep in mind that these tags can be written by javascripts as part of XSS attacks and the like, but not allowing the tags in your posted content is a step in a more secure direction.

EDIT | WOW, this thread really grew overnight! When it comes to harmful HTML, there only a few tags that can truly cause you problems. I forgot to mention the <img> tag, since folks can actually tie viruses to images now, you may want to watch out for that. Basically anything that has a 'src' attribute could be harmful because there is nothing limiting that element from reaching outside of your domain.

Re: What is 'harmful' HTML?

Posted: Tue Jul 04, 2006 12:32 pm
by bdlang
Bigun wrote: But is there a list of unharmful HTML that can be allowed?

examples:

Code: Select all

<b></b>
<font></font>
<br>
A section of my code will allow some HTML to go through for customization, but I need to know what can become potentially harmful.
I'm not the regular expressions guru, but it would seem to me a better idea to whitelist things that you want through rather than try to eliminate things that you don't want. If all you want to do is let the user have the ability to use those specific elements (of which, I disagree with <br />, for the sake that you should format the output 'sizing' yourself and not allow 100 new lines to be created with 100 <br /> elements) then let through only elements like <b>, <i>, <em>, <strong>, etc maybe..

Posted: Tue Jul 04, 2006 7:51 pm
by RobertGonzalez
This is the list of (X)HTML tags. You are safer allowing a certain group of tags rather than disallowing others. This is because anyone can add in just about any tag they want if you are checking for disallowed tags (for example, the <thisismytag> tag). Custom tags or other means of trying to sabotage your pages would be essentially eliminated if you provide your script a list of allowed tags.

Posted: Tue Jul 04, 2006 8:18 pm
by bdlang
Everah wrote:This is the list of (X)HTML tags. You are safer allowing a certain group of tags rather than disallowing others....
So you're in agreement with using a whitelist as I proposed?

Posted: Tue Jul 04, 2006 8:22 pm
by RobertGonzalez
Yeah. I was thinking about it, and after coming to the conclusion that a user could feasibly get really stupid and put any tag they want in there, it makes more sense to provide a whitelist as opposed to a blacklist.

PS If I said otherwise before, then I am changing gears faster than a trucker who sees his wife with a biker.