Hi All!
I've got a script that is trying to interface with an ASP application, and the application is generating some pretty messy code. I've gotten most everything working, but one thing I'm dealing with before inserting the results in to my page, is that it's writing inline CSS and/or using old html style attributes.
Anyone know a good way of removing these? I figure it will be easier to say "let's remove everything except src= alt= href= action= target= " than "let's remove [insert incredibly long list of things like cellspacing= here]". That said, if you can give me code to remove a specific HTML attribute, I will deal with it! A function that I can pass in an attribute or an array of attributes and have them removed would also be a godsend right now!
Well, I'm sorry for the brief post, I hope I was clear.
Thank you all for your time and help!
-OmniUni
Remove most HTML Attributes
Moderator: General Moderators
Re: Remove most HTML Attributes
I'm quite sure you can find something in the user comments for the strip_tags manual page.
Re: Remove most HTML Attributes
Thanks, but unfortunately, I don't think so. It's not the tags I want to strip, just the attributes. I'll check the user comments anyway, though!
-OmniUni
-OmniUni
Re: Remove most HTML Attributes
Actually, there may be some helpful stuff there, but I still think it's kind of inverse what I want. I need to keep most of the stuff, just remove any attributes that affect presentation, like <font face="" color=""> and cellspacing="" and border="" etc.
Re: Remove most HTML Attributes
If you make a couple modifications to the code from the latest comment (nauthiz693 - 12-Jun-2009) you can give a list of attributes to remove.
(Another modification: pass null as $allowtags and something for $disallowattributes to not call strip_tags on the input. $allowtags=null and $disallowattributes=null will call strip_tags.)
Code: Select all
function strip_tags_badattributes($string,$allowtags=NULL,$disallowattributes=NULL){
if (!is_null($allowtags) || is_null($disallowattributes)) $string = strip_tags($string,$allowtags);
if (!is_null($disallowattributes)) {
if(!is_array($disallowattributes))
$disallowattributes = explode(",",$disallowattributes);
if(is_array($disallowattributes))
$disallowattributes = implode("|",$disallowattributes);
if (strlen($disallowattributes) > 0)
$disallowattributes = "(?<=".$disallowattributes.")";
$string = preg_replace_callback("/<[^>]*>/i",create_function(
'$matches',
'return preg_replace("/ [^ =]*'.$disallowattributes.'=(\"[^\"]*\"|\'[^\']*\')/i", "", $matches[0]);'
),$string);
}
return $string;
}Re: Remove most HTML Attributes
awesome, tasairis!!!! Thanks so much!