Page 1 of 1

Help for a Regex newcomer

Posted: Wed Aug 29, 2012 4:38 pm
by profhp
Hi:

I have a question that I am sure is simple for you (but hard for me — I am new to regex).

I have a query that matches all bracketed text in a paragraph including the brackets (which I subsequently replace).

\(.*?\)

However, I would like to match only bracketed text that includes the numbers on the range 1900 through 2100.

That is, I would like to match: (Simon 1957) or (Simon 1957, Johnson 1927), or (Simon 1957 Johnson)

but not match: (Simon 1222) or (Simon) or (Simon Johnson).

Does anyone have any suggestions? Your help would be much appreciated.

Thanks very much.

Re: Help for a Regex newcomer

Posted: Wed Aug 29, 2012 9:05 pm
by requinix
Broken down a bit, what you want is

Code: Select all

a left parenthesis, some amount of text that isn't a right parenthesis, a number between 1900 and 2100, and some more text up to a right parenthesis
The number is the most complex part but the rest should be fairly easy to guess.

Code: Select all

\([^)]*(19\d\d|20\d\d|2100).*?\)
That leaves an issue open: consider the following

Code: Select all

(Simon 19575)
(Simon 51957)
(Simon 1957 (Johnson))
They all will literally agree the description above (read it carefully).
1. Is that text even possible to encounter?
2. Do you want to match it?

Re: Help for a Regex newcomer

Posted: Thu Aug 30, 2012 5:48 pm
by profhp
Thanks so much requinix! Your code is very helpful.

I have been playing with variations of your code for a while, and have one additional question.

How do I match (Simon 1957) but exclude (Simon 5195)? That is, the regex code looks for 19 anywhere and matches it. Is there a way to specify that the matched 19 needs to be preceded by a space?

Thanks again - I really appreciate your help!

Re: Help for a Regex newcomer

Posted: Thu Aug 30, 2012 6:43 pm
by requinix
profhp wrote:How do I match (Simon 1957) but exclude (Simon 5195)? That is, the regex code looks for 19 anywhere and matches it. Is there a way to specify that the matched 19 needs to be preceded by a space?
That won't be matched anyways because the 19XX has to have four digits.

If you want to require a space before the number then that's fine.

Code: Select all

\([^)]* (19\d\d|20\d\d|2100).*?\)

Re: Help for a Regex newcomer

Posted: Fri Aug 31, 2012 8:18 am
by profhp
requinix, thanks so much for your help!! Very much appreciated.