A that's not between B and C.
Moderator: General Moderators
A that's not between B and C.
How do I match a string that is NOT between two strings? In other words how do I match A that is not between B and C? For example let's say A was "foo" and B was "{" and C was "}". In "food {foo bar}", my regular expression should match "foo" in the word "food" at the beginning, but not the "foo" before "bar".
How could this be done in the JavaScript implementation of RegExp?
How could this be done in the JavaScript implementation of RegExp?
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: A that's not between B and C.
Do it in two steps:
1 - split on '{'
2 - for each element that split(...) returned, check if it matches: foo(?![^}]*})
1 - split on '{'
2 - for each element that split(...) returned, check if it matches: foo(?![^}]*})
Re: A that's not between B and C.
Hmm, but what I'm doing is a split on /;(?:\s{2,}|\s*\n+\s*|$)/g, so how would I put it all back together?
Code: Select all
var parts = "foo; { foo; } foo;".split(/{/g);
for (var i in parts)
{
parts[i] = parts[i].split(/;(?:\s{2,}|\s*\n+\s*|$)/g);
}
//How would I put parts back together if each part is not an array? I couldn't use join because it converts to a string.
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: A that's not between B and C.
Err, you were only talking about matching some substring. Apparently you want something else (transform some string into another string). Could you provide some examples of what you want. I urge you to post examples that look like your real data: people tend to over-simplify their examples and miss many of the small corner cases.
Re: A that's not between B and C.
I'm writing a javascript interpreter for this language I'm "inventing". It's not really a serious project right now, it's more experimental than anything. I hope that doesn't make it less of a valid thing to help not me with it.
So what I'm doing right now is "tokenizing" my language's code. I'm creating an array of all the statements then I'll make each statement an array of tokens or parts of the statement. Once I do these I'll pass this tokenized array into the parsing engine which will then make read the tokens, and perform specific things for specific tokens, etc. I'm not really sure how this parser is going to work yet, but I'll get to it.
Mainly right now I'm focusing on the tokenizer--the part that turns the code into special arrays. What I'm trying to do with the tokenizer is separate each statement of the language into it's own array element. In my language the { and } define a separate scope so I don't want the statements within these to be in separate array elements. Illustrations are better then words so here you go.
I want this to turn into an array like:
This is basically it. I'm trying to get the result as shown in the comment above.
Basically, I want to separate the string by something but ignore that something when it's within curly braces. I'll then have the tokenizer separate all the parts of each statement. In my language a "block" is considered a data type so it is treated the same as any other data type in a statement. Sense every block has scope in my language I don't want the statements within the block hanging out in the array, rather I want the block to be parsed later on at the statement level.
So what I'm doing right now is "tokenizing" my language's code. I'm creating an array of all the statements then I'll make each statement an array of tokens or parts of the statement. Once I do these I'll pass this tokenized array into the parsing engine which will then make read the tokens, and perform specific things for specific tokens, etc. I'm not really sure how this parser is going to work yet, but I'll get to it.
Mainly right now I'm focusing on the tokenizer--the part that turns the code into special arrays. What I'm trying to do with the tokenizer is separate each statement of the language into it's own array element. In my language the { and } define a separate scope so I don't want the statements within these to be in separate array elements. Illustrations are better then words so here you go.
Code: Select all
<textarea id="code">
method argument, {
method arg, arg;
};
method arg;
method arg;
</textarea>
Code: Select all
var code = document.getElementById("code").value;
var parts = code.split(/{/g);
for (var i in parts)
{
parts[i] = parts[i].split(/;(?:\s{2,}|\s*\n+\s*|$)/g);
}
//What I want is an array like this: ["method argument, {\n method arg, arg;\n}", "method arg", "method arg"]
Basically, I want to separate the string by something but ignore that something when it's within curly braces. I'll then have the tokenizer separate all the parts of each statement. In my language a "block" is considered a data type so it is treated the same as any other data type in a statement. Sense every block has scope in my language I don't want the statements within the block hanging out in the array, rather I want the block to be parsed later on at the statement level.
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: A that's not between B and C.
I see. I don't want to put you down, but I can't recommend a regex split/match solution to tokenize a language. What you could do is write a grammar and let some sort of tool like ANTLR or Yacc (there are many more) generate a lexer+parser for you.
Best of luck.
Best of luck.
Re: A that's not between B and C.
Does ANTLR or Yacc create a interpreter in JavaScript? I want my language to be able to run in a web browser with optional compilation. If compiled it'll compile to JavaScript, but if not compiled it could be interpreted on the fly by a JavaScript interpreter.
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: A that's not between B and C.
No, ...
Edit: I mean yes, ANTLR targets JavaScript as well. From the ANTLR website: Currently available with C, C#, ActionScript, JavaScript, and Java targets.
Edit: I mean yes, ANTLR targets JavaScript as well. From the ANTLR website: Currently available with C, C#, ActionScript, JavaScript, and Java targets.