A that's not between B and C.

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
JellyFish
DevNet Resident
Posts: 1361
Joined: Tue Feb 14, 2006 7:18 pm
Location: San Diego, CA

A that's not between B and C.

Post by JellyFish »

How do I match a string that is NOT between two strings? In other words how do I match A that is not between B and C? For example let's say A was "foo" and B was "{" and C was "}". In "food {foo bar}", my regular expression should match "foo" in the word "food" at the beginning, but not the "foo" before "bar".

How could this be done in the JavaScript implementation of RegExp?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: A that's not between B and C.

Post by prometheuzz »

Do it in two steps:
1 - split on '{'
2 - for each element that split(...) returned, check if it matches: foo(?![^}]*})
User avatar
JellyFish
DevNet Resident
Posts: 1361
Joined: Tue Feb 14, 2006 7:18 pm
Location: San Diego, CA

Re: A that's not between B and C.

Post by JellyFish »

Hmm, but what I'm doing is a split on /;(?:\s{2,}|\s*\n+\s*|$)/g, so how would I put it all back together?

Code: Select all

 
var parts = "foo;  { foo;  } foo;".split(/{/g);
for (var i in parts)
{
parts[i] = parts[i].split(/;(?:\s{2,}|\s*\n+\s*|$)/g);
}
//How would I put parts back together if each part is not an array? I couldn't use join because it converts to a string.
 
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: A that's not between B and C.

Post by prometheuzz »

Err, you were only talking about matching some substring. Apparently you want something else (transform some string into another string). Could you provide some examples of what you want. I urge you to post examples that look like your real data: people tend to over-simplify their examples and miss many of the small corner cases.
User avatar
JellyFish
DevNet Resident
Posts: 1361
Joined: Tue Feb 14, 2006 7:18 pm
Location: San Diego, CA

Re: A that's not between B and C.

Post by JellyFish »

I'm writing a javascript interpreter for this language I'm "inventing". It's not really a serious project right now, it's more experimental than anything. I hope that doesn't make it less of a valid thing to help not me with it.

So what I'm doing right now is "tokenizing" my language's code. I'm creating an array of all the statements then I'll make each statement an array of tokens or parts of the statement. Once I do these I'll pass this tokenized array into the parsing engine which will then make read the tokens, and perform specific things for specific tokens, etc. I'm not really sure how this parser is going to work yet, but I'll get to it.

Mainly right now I'm focusing on the tokenizer--the part that turns the code into special arrays. What I'm trying to do with the tokenizer is separate each statement of the language into it's own array element. In my language the { and } define a separate scope so I don't want the statements within these to be in separate array elements. Illustrations are better then words so here you go.

Code: Select all

 
<textarea id="code">
method argument, {
  method arg, arg;
};
method arg;
method arg;
</textarea>
 
I want this to turn into an array like:

Code: Select all

 
var code = document.getElementById("code").value;
var parts = code.split(/{/g);
for (var i in parts)
{
parts[i] = parts[i].split(/;(?:\s{2,}|\s*\n+\s*|$)/g);
}
 
//What I want is an array like this: ["method argument, {\n  method arg, arg;\n}", "method arg", "method arg"]
 
This is basically it. I'm trying to get the result as shown in the comment above.

Basically, I want to separate the string by something but ignore that something when it's within curly braces. I'll then have the tokenizer separate all the parts of each statement. In my language a "block" is considered a data type so it is treated the same as any other data type in a statement. Sense every block has scope in my language I don't want the statements within the block hanging out in the array, rather I want the block to be parsed later on at the statement level.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: A that's not between B and C.

Post by prometheuzz »

I see. I don't want to put you down, but I can't recommend a regex split/match solution to tokenize a language. What you could do is write a grammar and let some sort of tool like ANTLR or Yacc (there are many more) generate a lexer+parser for you.

Best of luck.
User avatar
JellyFish
DevNet Resident
Posts: 1361
Joined: Tue Feb 14, 2006 7:18 pm
Location: San Diego, CA

Re: A that's not between B and C.

Post by JellyFish »

Does ANTLR or Yacc create a interpreter in JavaScript? I want my language to be able to run in a web browser with optional compilation. If compiled it'll compile to JavaScript, but if not compiled it could be interpreted on the fly by a JavaScript interpreter.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: A that's not between B and C.

Post by prometheuzz »

No, ...

Edit: I mean yes, ANTLR targets JavaScript as well. From the ANTLR website: Currently available with C, C#, ActionScript, JavaScript, and Java targets.
Post Reply