Help for a Regex newcomer

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
profhp
Forum Newbie
Posts: 3
Joined: Wed Aug 29, 2012 4:34 pm

Help for a Regex newcomer

Post by profhp »

Hi:

I have a question that I am sure is simple for you (but hard for me — I am new to regex).

I have a query that matches all bracketed text in a paragraph including the brackets (which I subsequently replace).

\(.*?\)

However, I would like to match only bracketed text that includes the numbers on the range 1900 through 2100.

That is, I would like to match: (Simon 1957) or (Simon 1957, Johnson 1927), or (Simon 1957 Johnson)

but not match: (Simon 1222) or (Simon) or (Simon Johnson).

Does anyone have any suggestions? Your help would be much appreciated.

Thanks very much.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Help for a Regex newcomer

Post by requinix »

Broken down a bit, what you want is

Code: Select all

a left parenthesis, some amount of text that isn't a right parenthesis, a number between 1900 and 2100, and some more text up to a right parenthesis
The number is the most complex part but the rest should be fairly easy to guess.

Code: Select all

\([^)]*(19\d\d|20\d\d|2100).*?\)
That leaves an issue open: consider the following

Code: Select all

(Simon 19575)
(Simon 51957)
(Simon 1957 (Johnson))
They all will literally agree the description above (read it carefully).
1. Is that text even possible to encounter?
2. Do you want to match it?
profhp
Forum Newbie
Posts: 3
Joined: Wed Aug 29, 2012 4:34 pm

Re: Help for a Regex newcomer

Post by profhp »

Thanks so much requinix! Your code is very helpful.

I have been playing with variations of your code for a while, and have one additional question.

How do I match (Simon 1957) but exclude (Simon 5195)? That is, the regex code looks for 19 anywhere and matches it. Is there a way to specify that the matched 19 needs to be preceded by a space?

Thanks again - I really appreciate your help!
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Help for a Regex newcomer

Post by requinix »

profhp wrote:How do I match (Simon 1957) but exclude (Simon 5195)? That is, the regex code looks for 19 anywhere and matches it. Is there a way to specify that the matched 19 needs to be preceded by a space?
That won't be matched anyways because the 19XX has to have four digits.

If you want to require a space before the number then that's fine.

Code: Select all

\([^)]* (19\d\d|20\d\d|2100).*?\)
profhp
Forum Newbie
Posts: 3
Joined: Wed Aug 29, 2012 4:34 pm

Re: Help for a Regex newcomer

Post by profhp »

requinix, thanks so much for your help!! Very much appreciated.
Post Reply