Lookaround in C#

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
yonidebest
Forum Newbie
Posts: 2
Joined: Thu Jun 26, 2008 2:20 am

Lookaround in C#

Post by yonidebest »

I have the following regex, which purpose is to remove the comma from date pattern ("4 of july, 1999"):

(\\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\\d{1,4})

This is replaced with: "$1 of $2 $3".

The above regex works fine. My problem is that it can match this text too: "4 of july, 7 of august, 25 of april". The matches are:
1) "4 of july, 7"
2) "7 of august, 25"
i.e. it thinks that "7" and "25" is a year. I tried to fix this by adding a negative lookahead that would ignore these cases:

(\\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))

But it still insists on making the catch. What am I doing worng? how can I avoid the above two catches?

Thanks,
Yoni
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Lookaround in C#

Post by prometheuzz »

I don't see your problem. When I use your regex:

Code: Select all

(\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
on the text

Code: Select all

4 of july, 7 of august, 2008 25 of april, 99
I get the following matches:

Code: Select all

match 1 -> 7 of august, 2008
match 2 -> 25 of april, 99
But your regex is rather buggy. The following dates will go horribly wrong: "7 of august, 20008" ("7 of august, 2000" will be matched) and "7 of august, '99" (no match because of the single quote).
yonidebest
Forum Newbie
Posts: 2
Joined: Thu Jun 26, 2008 2:20 am

Re: Lookaround in C#

Post by yonidebest »

prometheuzz wrote:I don't see your problem. When I use your regex:

Code: Select all

(\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
on the text

Code: Select all

4 of july, 7 of august, 2008 25 of april, 99
I get the following matches:

Code: Select all

match 1 -> 7 of august, 2008
match 2 -> 25 of april, 99
But your regex is rather buggy. The following dates will go horribly wrong: "7 of august, 20008" ("7 of august, 2000" will be matched) and "7 of august, '99" (no match because of the single quote).
did you run this in c#?
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Lookaround in C#

Post by onion2k »

yonidebest wrote:did you run this in c#?
This is a PHP forum, so I'd guess not...
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Lookaround in C#

Post by prometheuzz »

yonidebest wrote:
prometheuzz wrote:I don't see your problem. When I use your regex:

Code: Select all

(\d{1,2}) of (january|february|march|april|may|june|july|august|september|october|november|december)?, (\d{1,4})(?! of (january|february|march|april|may|june|july|august|september|october|november|december))
on the text

Code: Select all

4 of july, 7 of august, 2008 25 of april, 99
I get the following matches:

Code: Select all

match 1 -> 7 of august, 2008
match 2 -> 25 of april, 99
But your regex is rather buggy. The following dates will go horribly wrong: "7 of august, 20008" ("7 of august, 2000" will be matched) and "7 of august, '99" (no match because of the single quote).
did you run this in c#?
No, but since C# regex engine is similar to Java's and PHP's engine (and this being a PHP forum), I don't see a problem since it worked fine with PHP and Java.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Lookaround in C#

Post by prometheuzz »

yonidebest wrote:
did you run this in c#?
I was curious if there really was a change, so I installed a C# SDK on a Windows box at work and compiled and ran the following snippet:

Code: Select all

Console.WriteLine(Regex.Replace(
  "4 of july, 7 of august, 2008 25 of april, 99", 
  "(\\d{1,2}) of (january|february|march|april|may|june|"+
  "july|august|september|october|november|december)?, "+
  "(\\d{1,4})(?! of (january|february|march|april|may|"+
  "june|july|august|september|october|november|december))",
  "$1 of $2 $3"));
and got the following output:

Code: Select all

4 of july, 7 of august 2008 25 of april 99
As you see, the correct comma's are removed.
Post Reply