Page 1 of 2
help with regular expression problem
Posted: Mon Mar 05, 2012 11:10 am
by shanbuv
Hi
i'm trying to build a regular expression that will look for a specific combination of strings but must not include a specific combination, and i'm having problems
what i need is to find all strings that:
1. contain the phrase "*contain*love*" (meaning, the sentence "contain love" is good, the sentence "contain big love" is good, the sentence "must contain great love" is good, the sentence "might contain very big love indeed" is good, etc.) - this part is easy and i don't have an issue with it.
2. DO NOT contain the exact combination "love child" (meaning, the sentence "might contain very big love sick child" is ok, but the sentence "might contain very big live child today" should not match - this is where i'm stumped
can anyone help ?
Re: help with regular expression problem
Posted: Mon Mar 05, 2012 12:20 pm
by ragax
Hi Shambuv,
This will do it:
It uses a
lookaround (negative lookahead) to ensure "love child" is not present.
Please let me know if you have any questions.
For the record, you could also place the negative lookahead after love:
But although this would forbid "contain little love child", it would still allow "contain love child love".
Re: help with regular expression problem
Posted: Wed Mar 07, 2012 8:03 am
by shanbuv
Works like a charm !
Many many thanks !!
Re: help with regular expression problem
Posted: Wed Mar 07, 2012 8:22 am
by shanbuv
OK, now i need to add another small twist to this:
1. contain the phrase "*contain*love*" (meaning, the sentence "contain love" is good, the sentence "contain big love" is good, the sentence "must contain great love" is good, the sentence "might contain very big love indeed" is good, etc.)
2. DO NOT contain the exact combination "love child" (meaning, the sentence "might contain very big love sick child" is ok, but the sentence "might contain very big love child today" should not match
3. the word "might" cannot appear before "contain" - so "may contain love" is ok, "may contain crazy love tonight" is ok, but "might contain love" should not match, "might contain crazy love" should not match, "might contain very big love child today" should not match
Help?
Re: help with regular expression problem
Posted: Wed Mar 07, 2012 12:38 pm
by ragax
Hi Shanbuv,
Delighted that it works for you.
For the twist you are asking for, it's the same idea: we add a negative lookbehind.
You can add a number of lookaheads and lookbehinds to specify what a string must look like, that's a common technique for password validation.
Code: Select all
(?!.*?love child)(?<!might\s)contain.*?love
Please let me know if this is what you need.

Re: help with regular expression problem
Posted: Thu Mar 08, 2012 9:46 am
by shanbuv
awesome ! works perfect
will run this against all the sentences to make sure everything is covered
many thanks
Re: help with regular expression problem
Posted: Thu Mar 08, 2012 1:34 pm
by ragax
You're welcome, shanbuv, please don't hesitate to ask again.
Wishing you a fun day.
Re: help with regular expression problem
Posted: Sun Mar 11, 2012 4:51 am
by shanbuv
OK, found one "hole"...
the following sentence:
"contains bliss and might contain love rumors"
is identified by:
(?!.*?love child)(?<!might\s)contain.*?love
which is not what i wanted
the word "might" cannot appear before "contain" - so "may contain love" is ok, "may contain crazy love tonight" is ok, but "might contain love" should not match, "might contain crazy love" should not match, "might contain very big love child today" should not match
i guess this is because the first contains is matched with the last love ?
how do i go about solving this ?
Re: help with regular expression problem
Posted: Sun Mar 11, 2012 2:17 pm
by ragax
Hi Shanbuv,
Let's see what happens if we change it to:
Code: Select all
^(?!(?>.*?love) child)(?!(?>.*?might)\scontain)(?>.*?contain)(?>.*?love)
- For now read it without paying attention to the four "?>" in the expression, I added these atomic groups because the expression is getting heavy with dot-stars and the four "?>" will help it fail faster when it needs to fail.
- The expression now has three rules:
1. Cannot contain "love child"
2. Cannot contain "might contain"
3. Must contain "contain .... love"
This works and fails with everything you have specified so far. But note that this will reject
might contain bliss and does contain love
Please confirm that this is what you intend, otherwise we'll tweak it again.
Re: help with regular expression problem
Posted: Mon Mar 12, 2012 7:15 am
by shanbuv
Hi,
First, thanks for the efforts, my head is spinning just trying to understand what you're generating...
the sentence "contains bliss and might contain love rumors" is handles properly, but "contains bliss and might contain love rumors" is not match - upon reading the rules you specified, i see the problem.
the rules i want are slightly different:
1. Must contain "contain .... love"
2. Cannot contain "love child"
3. Cannot contain "*might contain*love" - meaning "contains bliss and might contain love rumors" should not match, but "might contain bliss and contain love" should match (i don't care if "might" appears, just not before the "contain*love" section)
some more examples
"might contains bliss, affection or anything else but might contain love rumors" should not be matched
"might contains bliss, affection or anything else but contain love naturally" should be matched
"might contains bliss, affection or anything else but contain love child forever" should not be matched
many thanks
Re: help with regular expression problem
Posted: Mon Mar 12, 2012 3:19 pm
by ragax
Hi Shanbuv,
I am interpreting this:
3. Cannot contain "*might contain*love"
as:
Cannot contain "might contain[space]love"
because you later say
i don't care if "might" appears, just not before the "contain*love" section
If so, we just add space-love to the "might contain" negative lookahead in our previous regex:
Code: Select all
^(?!(?>.*?love) child)(?!(?>.*?might)\scontain\slove)(?>.*?contain)(?>.*?love)
Let me know if that works for you.
Re: help with regular expression problem
Posted: Tue Mar 13, 2012 10:25 am
by shanbuv
Hi
when i said, cannot contain "*might contain*love" , i meant cannot contain "might[space]contain[anything]love" , but in a non greedy way
for example
"might contains bliss, affection or anything else but might contain love rumors" should not be matched, since "might contain love" appears
"might contains bliss, affection or anything else but might contain big big love" should not be matched, since "might contain[anything]love" appears
the problematic example
"might contains bliss, affection or anything else but contain big big love" SHOULD match since the second "contain" part is ok.
the rule here, and i hope i explain correctly:
cannot contain "*might[space]contain[anything]love" unless there's another "contain" in the [anything] part, in which case, if other rules apply (contains[anything]love and does not have "love child") then it should match (=if you find another "contain" in the [anything], start checking again...)
do i make sense?
thanks
S.
Re: help with regular expression problem
Posted: Tue Mar 13, 2012 4:00 pm
by ragax
Code: Select all
^(?!(?>.*?love) child)(?!(?>.*?might)\scontain(?:.(?!(?<!might )contain))+?love)(?>.*?contain)(?>.*?love)
Have fun with that.
It made sense to me a second ago, but don't ask me to explain it as there's a triple negative.

Re: help with regular expression problem
Posted: Tue Mar 13, 2012 4:05 pm
by ragax
By the way, the triple negative is a sign that the "
say what you DON'T want approach" has maxed out on this regex.
At this stage, to refine the regex for readability, I might switch to a "
say what you DO want" approach: match (expression without "might contain") OR (expression with "might contain" in a way that is acceptable).
In the meantime, the expression as it is should work. Let me know if you need further help on it.
Technically, though, the expression above is quite interesting (for someone learning regex) because it showcases the use of a lookaround within a lookaround (specifically, a negative loobehind within a negative lookahead within a negative lookahead).
Re: help with regular expression problem
Posted: Thu Mar 15, 2012 6:56 am
by shanbuv
Good god.... how in god's name did you manage to come up with this ?
will test this and let you know
10x again
S