strip_tags

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

Post Reply
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

strip_tags

Post by alex.barylski »

Just a concept idea:

If I were to run strip_tags() over all GPC data (assuming no input was to be HTML) this would in theory prevent most XSS exploits 1 & 2 I believe? I guess DOM injection would still be possible but this likely needs to be address on the client...

Anyways I am aware of HTML_Purifier and it's ability to filter HTML in accordance with UTF-8 however to call HTML_Purifier on *every* incoming variable would be over kill and introduce a big performance hit.

So I figure strip_tags would likely suffice if all I wanted to do was remove any likelyhood of HTML sneaking into the GPC data. My concern is that of localization. Is strip_tags() safe to use in all instances?

Are HTML tags always encoded() as the ASCII < and > or can they be some other unicode code point and thus be interpreted by a browser as HTML code? I assume strip_tags() will only remove the ASCII versions and while it makes sense that tags can only be ASCII characters I cannot be certain for sure and thus the security scare. :)

What says you, assuming all I want to do is remove all HTML tags (no exceptions) is strip_tags a safe bet?

Cheers,
Alex
User avatar
veridicus
Forum Commoner
Posts: 86
Joined: Fri Feb 23, 2007 9:16 am

Re: strip_tags

Post by veridicus »

strip_tags is binary safe in PHP 5, so UTF-8 strings should be ok.

I prefer to validate input, but filter output for my web application security. So I'll take whatever the user puts into a text area, for example, and store it. But on output I'll either strip_tags, htmlspecialchars, or Smarty escape depending on the situation. I don't think a global strip_tags filter is useful, especially if a special case comes up later where you do want to allow a tag you were stripping, but for just one field.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: strip_tags

Post by alex.barylski »

I don't think it's binary safe that concerns me. What concerns me is whether HTML tag delimiters (ie: < or >) can be anything other than their ASCII values. Does any HTML parsing engine understand some Farsi symbol which might be used to start and complete HTML tags or are HTML tags *always* the ASCII characters?
don't think a global strip_tags filter is useful, especially if a special case comes up later where you do want to allow a tag you were stripping, but for just one field.
Only in concept is that what I'm doing...my design is a lot smarter than that and knows exactly which fields are never going to need HTML. ;)
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: strip_tags

Post by alex.barylski »

I've Googled some more on the subject and found a few interesting articles: http://secunia.com/advisories/12064/

No mention about my concern though... :(

I did manage to find several other sources which suggest strip_tags alone is not enough:

http://isisblogs.poly.edu/2008/08/16/ph ... ainst-xss/
http://www.net-security.org/vuln.php?id=3570

Sounds like I will probably have to use htmlentities as well just to be safe...
Post Reply