Page 1 of 1

Remove most HTML Attributes

Posted: Wed Jun 17, 2009 12:40 pm
by omniuni
Hi All!

I've got a script that is trying to interface with an ASP application, and the application is generating some pretty messy code. I've gotten most everything working, but one thing I'm dealing with before inserting the results in to my page, is that it's writing inline CSS and/or using old html style attributes.

Anyone know a good way of removing these? I figure it will be easier to say "let's remove everything except src= alt= href= action= target= " than "let's remove [insert incredibly long list of things like cellspacing= here]". That said, if you can give me code to remove a specific HTML attribute, I will deal with it! A function that I can pass in an attribute or an array of attributes and have them removed would also be a godsend right now!

Well, I'm sorry for the brief post, I hope I was clear.

Thank you all for your time and help!

-OmniUni

Re: Remove most HTML Attributes

Posted: Wed Jun 17, 2009 12:46 pm
by requinix
I'm quite sure you can find something in the user comments for the strip_tags manual page.

Re: Remove most HTML Attributes

Posted: Wed Jun 17, 2009 1:42 pm
by omniuni
Thanks, but unfortunately, I don't think so. It's not the tags I want to strip, just the attributes. I'll check the user comments anyway, though!

-OmniUni

Re: Remove most HTML Attributes

Posted: Wed Jun 17, 2009 1:50 pm
by omniuni
Actually, there may be some helpful stuff there, but I still think it's kind of inverse what I want. I need to keep most of the stuff, just remove any attributes that affect presentation, like <font face="" color=""> and cellspacing="" and border="" etc.

Re: Remove most HTML Attributes

Posted: Wed Jun 17, 2009 3:13 pm
by requinix
If you make a couple modifications to the code from the latest comment (nauthiz693 - 12-Jun-2009) you can give a list of attributes to remove.

Code: Select all

function strip_tags_badattributes($string,$allowtags=NULL,$disallowattributes=NULL){
    if (!is_null($allowtags) || is_null($disallowattributes)) $string = strip_tags($string,$allowtags);
    if (!is_null($disallowattributes)) {
        if(!is_array($disallowattributes))
            $disallowattributes = explode(",",$disallowattributes);
        if(is_array($disallowattributes))
            $disallowattributes = implode("|",$disallowattributes);
        if (strlen($disallowattributes) > 0)
            $disallowattributes = "(?<=".$disallowattributes.")";
        $string = preg_replace_callback("/<[^>]*>/i",create_function(
            '$matches',
            'return preg_replace("/ [^ =]*'.$disallowattributes.'=(\"[^\"]*\"|\'[^\']*\')/i", "", $matches[0]);'   
        ),$string);
    }
    return $string;
}
(Another modification: pass null as $allowtags and something for $disallowattributes to not call strip_tags on the input. $allowtags=null and $disallowattributes=null will call strip_tags.)

Re: Remove most HTML Attributes

Posted: Wed Jun 17, 2009 4:19 pm
by omniuni
awesome, tasairis!!!! Thanks so much!