'Ello,
I'm in need of a class which:
- allows only whitelisted HTML in a string
- replaces < and > with their html entities for tags that aren't whitelisted
- strips out attributes and the attributes' values if the attributes aren't whitelisted for that particular tag
- is preferably OO and unit tested
I searched through phpClasses, but didn't find one that met any of the criteria. Does anyone here know of one, or at least one which meets some of the requirements and I could build the other requirements into?
- Nathaniel
Allow Whitelisted HTML Class
Moderator: General Moderators
- Ollie Saunders
- DevNet Master
- Posts: 3179
- Joined: Tue May 24, 2005 6:01 pm
- Location: UK
I've virtually no knowledge of what is available with regards to 3rd party code. But if you've done any DOM scripting you may find PHP's DOM functions useful. They are probably the best thing for parsing HTML.
I recon you could probably do what you want in 50 lines or less with that.
I recon you could probably do what you want in 50 lines or less with that.
Hmm, but (w|c)ouldn't those break when given malformed html (malformed html that might even end up working on some browsers, causing a security flaw)?
I think what I want to do is convert all < and > to < and >, and then convert < and > back to < and > for whitelisted tags.
Stripping out non-whitelisted attributes needs to be put somewhere in there, but I'm going to start with that and see where the unit tests take me.
I think what I want to do is convert all < and > to < and >, and then convert < and > back to < and > for whitelisted tags.
Stripping out non-whitelisted attributes needs to be put somewhere in there, but I'm going to start with that and see where the unit tests take me.
- Ollie Saunders
- DevNet Master
- Posts: 3179
- Joined: Tue May 24, 2005 6:01 pm
- Location: UK
I honestly don't know. You'd think it would ignore malformed code. If this is the case you could write your code in a fashion that assumes everything is evil except what the DOM can parse and is set to be whitelisted. Try it.Hmm, but (w|c)ouldn't those break when given malformed html (malformed html that might even end up working on some browsers, causing a security flaw)?
Otherwise you may have to think about writing it yourself. Either by using character by character processing or a series of pregs. Both of which would probably be slower performance wise.