Allow Whitelisted HTML Class

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

Post Reply
User avatar
Nathaniel
Forum Contributor
Posts: 396
Joined: Wed Aug 31, 2005 5:58 pm
Location: Arkansas, USA

Allow Whitelisted HTML Class

Post by Nathaniel »

'Ello,

I'm in need of a class which:
- allows only whitelisted HTML in a string
- replaces < and > with their html entities for tags that aren't whitelisted
- strips out attributes and the attributes' values if the attributes aren't whitelisted for that particular tag
- is preferably OO and unit tested

I searched through phpClasses, but didn't find one that met any of the criteria. Does anyone here know of one, or at least one which meets some of the requirements and I could build the other requirements into?

- Nathaniel
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

I've virtually no knowledge of what is available with regards to 3rd party code. But if you've done any DOM scripting you may find PHP's DOM functions useful. They are probably the best thing for parsing HTML.

I recon you could probably do what you want in 50 lines or less with that.
User avatar
Nathaniel
Forum Contributor
Posts: 396
Joined: Wed Aug 31, 2005 5:58 pm
Location: Arkansas, USA

Post by Nathaniel »

Hmm, but (w|c)ouldn't those break when given malformed html (malformed html that might even end up working on some browsers, causing a security flaw)?

I think what I want to do is convert all < and > to < and >, and then convert < and > back to < and > for whitelisted tags.

Stripping out non-whitelisted attributes needs to be put somewhere in there, but I'm going to start with that and see where the unit tests take me.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Hmm, but (w|c)ouldn't those break when given malformed html (malformed html that might even end up working on some browsers, causing a security flaw)?
I honestly don't know. You'd think it would ignore malformed code. If this is the case you could write your code in a fashion that assumes everything is evil except what the DOM can parse and is set to be whitelisted. Try it.

Otherwise you may have to think about writing it yourself. Either by using character by character processing or a series of pregs. Both of which would probably be slower performance wise.
Post Reply