Hi,
I am looking for a nice perl regular expression that may match on* attributes in all possible html tags (eg onclick, onmouseover, etc ...), whatever case they're in.
Does anyone have it by chance? I tried to do my own but it is really not my cup of tea.
Thanks a lot.
Cyril
removing nested javascript from html tags with a Perl REGEX
Moderator: General Moderators
lol. i recently made one.
how i created mine: wrote all the inlines i know.
noticed they all start with "on"
noticed they range between 4 and 9 characters
the all have alpha characters
case is irrelevant
the must have "" on the equal side
therefor you need on\w{4,9} to start the pattern. but this isn't enough. what if new ones are made or i missed any? well it's obvious it starts with on, so start the pattern (on\w+)
now what's next? you can have a space, must have an = and then another optional space (\s*=\s*)
and then there's the next side, which the boundries are " and " thus you need everything from the first " that isn't the second " and the second " ("[^"]*")
i don't like to just give the code without the person understanding what's behind it, that's why i gave this like i did however, you know have the pattern using the perl shorts. you should be able to modify to posix if you desire.
and unlike giving you the code straight out, this should show you what you need by section.
also, if you're that lazy, i think it's in one of my previous posts... or maybe all i did was link to it so you missed it
how i created mine: wrote all the inlines i know.
noticed they all start with "on"
noticed they range between 4 and 9 characters
the all have alpha characters
case is irrelevant
the must have "" on the equal side
therefor you need on\w{4,9} to start the pattern. but this isn't enough. what if new ones are made or i missed any? well it's obvious it starts with on, so start the pattern (on\w+)
now what's next? you can have a space, must have an = and then another optional space (\s*=\s*)
and then there's the next side, which the boundries are " and " thus you need everything from the first " that isn't the second " and the second " ("[^"]*")
i don't like to just give the code without the person understanding what's behind it, that's why i gave this like i did however, you know have the pattern using the perl shorts. you should be able to modify to posix if you desire.
and unlike giving you the code straight out, this should show you what you need by section.
also, if you're that lazy, i think it's in one of my previous posts... or maybe all i did was link to it so you missed it
Beware of loose html code!
I have just found out that attributes value do not even need to be in quotes (which can be simple or double quotes by the way) to be parsed. On IE5 for instance, is valid javascript and this need to be removed by the regex as well!
Here is the last regex
I have just found out that attributes value do not even need to be in quotes (which can be simple or double quotes by the way) to be parsed. On IE5 for instance,
Code: Select all
<p onmouseover=alert('boo')>Here is the last regex
Code: Select all
<?php
//sample list, full one on the w3 web site
define("DISALLOWED_ATTRIBUTE_LIST","onblur|onchange|onclick|ondblclick");
//testing $str
if (preg_match_all(/('. DISALLOWED_ATTRIBUTE_LIST .')\s*=\s*("[^"]*"|''[^'']*''|[^ >]*)/i,$str,$aAttriMatch)){
$aError=&$aAttriMatch[0];
$bErr=1;
}
P.S m3rajk thank you for your help
?>