Page 1 of 1
Lookaround in C#
Posted: Thu Jun 26, 2008 2:31 am
by yonidebest
I have the following regex, which purpose is to remove the comma from date pattern ("4 of july, 1999"):
(\\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\\d{1,4})
This is replaced with: "$1 of $2 $3".
The above regex works fine. My problem is that it can match this text too: "4 of july, 7 of august, 25 of april". The matches are:
1) "4 of july, 7"
2) "7 of august, 25"
i.e. it thinks that "7" and "25" is a year. I tried to fix this by adding a negative lookahead that would ignore these cases:
(\\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
But it still insists on making the catch. What am I doing worng? how can I avoid the above two catches?
Thanks,
Yoni
Re: Lookaround in C#
Posted: Thu Jun 26, 2008 2:59 am
by prometheuzz
I don't see your problem. When I use your regex:
Code: Select all
(\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
on the text
Code: Select all
4 of july, 7 of august, 2008 25 of april, 99
I get the following matches:
Code: Select all
match 1 -> 7 of august, 2008
match 2 -> 25 of april, 99
But your regex is rather buggy. The following dates will go horribly wrong: "7 of august, 20008" ("7 of august, 2000" will be matched) and "7 of august, '99" (no match because of the single quote).
Re: Lookaround in C#
Posted: Thu Jun 26, 2008 3:11 am
by yonidebest
prometheuzz wrote:I don't see your problem. When I use your regex:
Code: Select all
(\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
on the text
Code: Select all
4 of july, 7 of august, 2008 25 of april, 99
I get the following matches:
Code: Select all
match 1 -> 7 of august, 2008
match 2 -> 25 of april, 99
But your regex is rather buggy. The following dates will go horribly wrong: "7 of august, 20008" ("7 of august, 2000" will be matched) and "7 of august, '99" (no match because of the single quote).
did you run this in c#?
Re: Lookaround in C#
Posted: Thu Jun 26, 2008 3:16 am
by onion2k
yonidebest wrote:did you run this in c#?
This is a PHP forum, so I'd guess not...
Re: Lookaround in C#
Posted: Thu Jun 26, 2008 3:18 am
by prometheuzz
yonidebest wrote:prometheuzz wrote:I don't see your problem. When I use your regex:
Code: Select all
(\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
on the text
Code: Select all
4 of july, 7 of august, 2008 25 of april, 99
I get the following matches:
Code: Select all
match 1 -> 7 of august, 2008
match 2 -> 25 of april, 99
But your regex is rather buggy. The following dates will go horribly wrong: "7 of august, 20008" ("7 of august, 2000" will be matched) and "7 of august, '99" (no match because of the single quote).
did you run this in c#?
No, but since C# regex engine is similar to Java's and PHP's engine (and this being a PHP forum), I don't see a problem since it worked fine with PHP and Java.
Re: Lookaround in C#
Posted: Thu Jun 26, 2008 6:46 am
by prometheuzz
yonidebest wrote:
did you run this in c#?
I was curious if there really was a change, so I installed a C# SDK on a Windows box at work and compiled and ran the following snippet:
Code: Select all
Console.WriteLine(Regex.Replace(
"4 of july, 7 of august, 2008 25 of april, 99",
"(\\d{1,2}) of (january|february|march|april|may|june|"+
"july|august|september|october|november|december)?, "+
"(\\d{1,4})(?! of (january|february|march|april|may|"+
"june|july|august|september|october|november|december))",
"$1 of $2 $3"));
and got the following output:
Code: Select all
4 of july, 7 of august 2008 25 of april 99
As you see, the correct comma's are removed.