Compse JavaScript Regular Expression
Moderator: General Moderators
-
JunbinDuan
- Forum Newbie
- Posts: 4
- Joined: Fri Aug 13, 2010 3:28 pm
Compse JavaScript Regular Expression
Hi,
I have an HTML document that contains following piece:
<THEAD>
<TR style="DISPLAY: none">
....
</TR></THEAD>
I need to get rid of this piece from the document. I wrote following javascript (editorContent holds the html document):
var re = new RegExp("<THEAD>[.\s]*<TR style="DISPLAY: none\">[.\s]*</TR></THEAD>", "gi");
editorContent = editorContent.replace(re, ";");
But it does not seem to work. Does anybody know what wrong with the express?
Thank you very much,
Junbin
I have an HTML document that contains following piece:
<THEAD>
<TR style="DISPLAY: none">
....
</TR></THEAD>
I need to get rid of this piece from the document. I wrote following javascript (editorContent holds the html document):
var re = new RegExp("<THEAD>[.\s]*<TR style="DISPLAY: none\">[.\s]*</TR></THEAD>", "gi");
editorContent = editorContent.replace(re, ";");
But it does not seem to work. Does anybody know what wrong with the express?
Thank you very much,
Junbin
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: Compse JavaScript Regular Expression
Note that "does not seem to work" is a rather vague problem description.
But I think I can make an educated guess what you mean: inside a character class, the DOT does not match any character (except line breaks), but it matches the literal '.'
So, instead of [.\s]*, try [\S\s]*
But I think I can make an educated guess what you mean: inside a character class, the DOT does not match any character (except line breaks), but it matches the literal '.'
So, instead of [.\s]*, try [\S\s]*
-
JunbinDuan
- Forum Newbie
- Posts: 4
- Joined: Fri Aug 13, 2010 3:28 pm
Re: Compse JavaScript Regular Expression
Thanks for the reply. I have updated my scripts to following:
var re = new RegExp(";[^;]*ResizeImage.this.;", "gi");
editorContent = editorContent.replace(re, ";");
re = new RegExp("<THEAD>[\S\s]*<TR style=\"DISPLAY: none\">[\S\s]*</TR></THEAD>", "gi");
editorContent = editorContent.replace(re, "");
The first expression works, but the second does not - it did not change the value of editorContent. Any ideas?
Thanks.
var re = new RegExp(";[^;]*ResizeImage.this.;", "gi");
editorContent = editorContent.replace(re, ";");
re = new RegExp("<THEAD>[\S\s]*<TR style=\"DISPLAY: none\">[\S\s]*</TR></THEAD>", "gi");
editorContent = editorContent.replace(re, "");
The first expression works, but the second does not - it did not change the value of editorContent. Any ideas?
Thanks.
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: Compse JavaScript Regular Expression
Yes: then there is no string in your input that matches your regex.... Any ideas?
-
JunbinDuan
- Forum Newbie
- Posts: 4
- Joined: Fri Aug 13, 2010 3:28 pm
Re: Compse JavaScript Regular Expression
Of course the input has the string.
thx
thx
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: Compse JavaScript Regular Expression
Well, if that were the case, there would be a match right?Of course the input has the string.
Again: telling "it doesn't work" tells absolutely nothing about the problem you're facing! For people who do have the patience to try and help you, I suggest you post the actual input (or the relevant part of it) that you think should match but doesn't.
Best of luck.
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Compse JavaScript Regular Expression
One reason that your regex is not working is because it is being specified in a "string". In a Javascript regex string you must use two backslashes to get one to appear in the regex. A single backslash in front of anything other than a recognized metacharacter is simply discarded (i.e. the [\S\s] in your regex string is being seen as: [Ss] by the regex engine.) Thus, your regex should work quite a bit better written like this:
Or better yet, use a native Javascript literal regex to specify your pattern like so:
With this syntax you don't need to escape the quotes or the backslashes, (but you do need to escape the forward slashes which are used as delimiters for Javascript regex literals).
Another important point: your regex would go much faster and will be more accurate if you use lazy quantifiers. The greedy /[\S\s]*/ matches everything all the way to the end of the document and must (slowly) backtrack one character at a time to get back. (And your regex has two of these - which can also lead to catastrophic backtracking if you have a mal-formed HTML document.) Worse, if you have more than one table in your HTML markup with these types of THEADs, the regex will erroneously delete everything between the first THEAD and the last THEAD, (which is probably not what you want!) To fix this potential problem, add the lazy '?' modifier to both the * star quantifiers like so: (Also there is no need to match the closing TR tag).
But this can be improved further. Assuming that the TR element immediately follows the THEAD (with some possible whitespace in between), we can replace the first [\S\s]* with simply \s* and this will eliminate any possible catastrophic backtracking. Also, the TR start tag may have some other attributes other than just STYLE (and the style attribute value itself may have additional selectors other just "DISPLAY: none"). The following regex is quite a bit more complex but I think you'll find that it matches better and runs much faster:
It also allows the THEAD tag to have attributes and uses the efficient "unrolling-the-loop" technique for matching the bulk of the stuff between the tags.
Hope this helps!

Code: Select all
re = new RegExp("<THEAD>[\\S\\s]*<TR style=\"DISPLAY: none\">[\\S\\s]*</TR></THEAD>", "gi");
editorContent = editorContent.replace(re, "");Code: Select all
re = /<THEAD>[\S\s]*<TR style="DISPLAY: none">[\S\s]*<\/TR><\/THEAD>/gi;
editorContent = editorContent.replace(re, "");Another important point: your regex would go much faster and will be more accurate if you use lazy quantifiers. The greedy /[\S\s]*/ matches everything all the way to the end of the document and must (slowly) backtrack one character at a time to get back. (And your regex has two of these - which can also lead to catastrophic backtracking if you have a mal-formed HTML document.) Worse, if you have more than one table in your HTML markup with these types of THEADs, the regex will erroneously delete everything between the first THEAD and the last THEAD, (which is probably not what you want!) To fix this potential problem, add the lazy '?' modifier to both the * star quantifiers like so: (Also there is no need to match the closing TR tag).
Code: Select all
re = /<THEAD>[\S\s]*?<TR style="DISPLAY: none">[\S\s]*?<\/THEAD>/ig;
editorContent = editorContent.replace(re, "");Code: Select all
re = /<THEAD\b[^>]*>\s*<TR\b[^>]*?style="[^"]*?DISPLAY:\s*none[^"]*"[^>]*>[^<]*(?:(?!<\/THEAD>)<[^<]*)*<\/THEAD>/ig;
editorContent = editorContent.replace(re, "");Hope this helps!
-
JunbinDuan
- Forum Newbie
- Posts: 4
- Joined: Fri Aug 13, 2010 3:28 pm
Re: Compse JavaScript Regular Expression
Hi Ridgerunner,
Your suggestion just worked perfectly. Thank you very much for your patience and expertise. You are the star.
Junbin
Your suggestion just worked perfectly. Thank you very much for your patience and expertise. You are the star.
Junbin