Remove most HTML Attributes

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
omniuni
Forum Regular
Posts: 738
Joined: Tue Jul 15, 2008 10:50 pm
Location: Carolina, USA

Remove most HTML Attributes

Post by omniuni »

Hi All!

I've got a script that is trying to interface with an ASP application, and the application is generating some pretty messy code. I've gotten most everything working, but one thing I'm dealing with before inserting the results in to my page, is that it's writing inline CSS and/or using old html style attributes.

Anyone know a good way of removing these? I figure it will be easier to say "let's remove everything except src= alt= href= action= target= " than "let's remove [insert incredibly long list of things like cellspacing= here]". That said, if you can give me code to remove a specific HTML attribute, I will deal with it! A function that I can pass in an attribute or an array of attributes and have them removed would also be a godsend right now!

Well, I'm sorry for the brief post, I hope I was clear.

Thank you all for your time and help!

-OmniUni
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Remove most HTML Attributes

Post by requinix »

I'm quite sure you can find something in the user comments for the strip_tags manual page.
User avatar
omniuni
Forum Regular
Posts: 738
Joined: Tue Jul 15, 2008 10:50 pm
Location: Carolina, USA

Re: Remove most HTML Attributes

Post by omniuni »

Thanks, but unfortunately, I don't think so. It's not the tags I want to strip, just the attributes. I'll check the user comments anyway, though!

-OmniUni
User avatar
omniuni
Forum Regular
Posts: 738
Joined: Tue Jul 15, 2008 10:50 pm
Location: Carolina, USA

Re: Remove most HTML Attributes

Post by omniuni »

Actually, there may be some helpful stuff there, but I still think it's kind of inverse what I want. I need to keep most of the stuff, just remove any attributes that affect presentation, like <font face="" color=""> and cellspacing="" and border="" etc.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Remove most HTML Attributes

Post by requinix »

If you make a couple modifications to the code from the latest comment (nauthiz693 - 12-Jun-2009) you can give a list of attributes to remove.

Code: Select all

function strip_tags_badattributes($string,$allowtags=NULL,$disallowattributes=NULL){
    if (!is_null($allowtags) || is_null($disallowattributes)) $string = strip_tags($string,$allowtags);
    if (!is_null($disallowattributes)) {
        if(!is_array($disallowattributes))
            $disallowattributes = explode(",",$disallowattributes);
        if(is_array($disallowattributes))
            $disallowattributes = implode("|",$disallowattributes);
        if (strlen($disallowattributes) > 0)
            $disallowattributes = "(?<=".$disallowattributes.")";
        $string = preg_replace_callback("/<[^>]*>/i",create_function(
            '$matches',
            'return preg_replace("/ [^ =]*'.$disallowattributes.'=(\"[^\"]*\"|\'[^\']*\')/i", "", $matches[0]);'   
        ),$string);
    }
    return $string;
}
(Another modification: pass null as $allowtags and something for $disallowattributes to not call strip_tags on the input. $allowtags=null and $disallowattributes=null will call strip_tags.)
User avatar
omniuni
Forum Regular
Posts: 738
Joined: Tue Jul 15, 2008 10:50 pm
Location: Carolina, USA

Re: Remove most HTML Attributes

Post by omniuni »

awesome, tasairis!!!! Thanks so much!
Post Reply