Page 1 of 1

Lookaround in C#

Posted: Thu Jun 26, 2008 2:31 am
by yonidebest
I have the following regex, which purpose is to remove the comma from date pattern ("4 of july, 1999"):

(\\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\\d{1,4})

This is replaced with: "$1 of $2 $3".

The above regex works fine. My problem is that it can match this text too: "4 of july, 7 of august, 25 of april". The matches are:
1) "4 of july, 7"
2) "7 of august, 25"
i.e. it thinks that "7" and "25" is a year. I tried to fix this by adding a negative lookahead that would ignore these cases:

(\\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))

But it still insists on making the catch. What am I doing worng? how can I avoid the above two catches?

Thanks,
Yoni

Re: Lookaround in C#

Posted: Thu Jun 26, 2008 2:59 am
by prometheuzz
I don't see your problem. When I use your regex:

Code: Select all

(\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
on the text

Code: Select all

4 of july, 7 of august, 2008 25 of april, 99
I get the following matches:

Code: Select all

match 1 -> 7 of august, 2008
match 2 -> 25 of april, 99
But your regex is rather buggy. The following dates will go horribly wrong: "7 of august, 20008" ("7 of august, 2000" will be matched) and "7 of august, '99" (no match because of the single quote).

Re: Lookaround in C#

Posted: Thu Jun 26, 2008 3:11 am
by yonidebest
prometheuzz wrote:I don't see your problem. When I use your regex:

Code: Select all

(\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
on the text

Code: Select all

4 of july, 7 of august, 2008 25 of april, 99
I get the following matches:

Code: Select all

match 1 -> 7 of august, 2008
match 2 -> 25 of april, 99
But your regex is rather buggy. The following dates will go horribly wrong: "7 of august, 20008" ("7 of august, 2000" will be matched) and "7 of august, '99" (no match because of the single quote).
did you run this in c#?

Re: Lookaround in C#

Posted: Thu Jun 26, 2008 3:16 am
by onion2k
yonidebest wrote:did you run this in c#?
This is a PHP forum, so I'd guess not...

Re: Lookaround in C#

Posted: Thu Jun 26, 2008 3:18 am
by prometheuzz
yonidebest wrote:
prometheuzz wrote:I don't see your problem. When I use your regex:

Code: Select all

(\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
on the text

Code: Select all

4 of july, 7 of august, 2008 25 of april, 99
I get the following matches:

Code: Select all

match 1 -> 7 of august, 2008
match 2 -> 25 of april, 99
But your regex is rather buggy. The following dates will go horribly wrong: "7 of august, 20008" ("7 of august, 2000" will be matched) and "7 of august, '99" (no match because of the single quote).
did you run this in c#?
No, but since C# regex engine is similar to Java's and PHP's engine (and this being a PHP forum), I don't see a problem since it worked fine with PHP and Java.

Re: Lookaround in C#

Posted: Thu Jun 26, 2008 6:46 am
by prometheuzz
yonidebest wrote:
did you run this in c#?
I was curious if there really was a change, so I installed a C# SDK on a Windows box at work and compiled and ran the following snippet:

Code: Select all

Console.WriteLine(Regex.Replace(
  "4 of july, 7 of august, 2008 25 of april, 99", 
  "(\\d{1,2}) of (january|february|march|april|may|june|"+
  "july|august|september|october|november|december)?, "+
  "(\\d{1,4})(?! of (january|february|march|april|may|"+
  "june|july|august|september|october|november|december))",
  "$1 of $2 $3"));
and got the following output:

Code: Select all

4 of july, 7 of august 2008 25 of april 99
As you see, the correct comma's are removed.