Extract text?

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
jemmer
Forum Newbie
Posts: 1
Joined: Tue Sep 26, 2006 4:16 pm

Extract text?

Post by jemmer »

Hi,

I am totally incompetent when comes to regular expressions, so I come asking for help. I'm sure this is easy, but I have no idea how to do this.

I need some help constructing an expression to parse out some particular text from a text file.

I know I could do this the "hard way" by scanning the input strings looking for the particular characters I'm interested in, but there is enough potential variations in the text that I think a regular expression is the best solution to my problem.

The text will look something like:

CASE NUMBER: 123456789

There may be leading or trailing whitespace in this line. There may be any number of intervening spaces, but there will always be at least 1, between the NUMBER: text literal and the digits which follow. There may or may not be a colon after NUMBER. The number of digits may vary; there are always at least 5, and there may be as many as 12 digits.

I need to extract those digits into a named group, at least I think I need to - I need the case number for subsequent processing, and a named group seems the best way to handle that, but what do I know? Once I find the case number in the text, I'm done - there is no more need to parse anything else.

That's it. Simple really, but I've wasted a lot of time fumbling around so I thought I'd ask for some help,

Thanks,

- Jeff
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

Be sure to use the case insensitive flag.

Code: Select all

\s{0,5}CASE\s{0,5}NUMBER:{0,1}\s{1,5}[0-9]{5,12}
To get the case number add ()

Code: Select all

\s{0,5}CASE\s{0,5}NUMBER:{0,1}\s{1,5}([0-9]){5,12}
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

but there is enough potential variations in the text that I think a regular expression is the best solution to my problem.
Quite right, excellent thinking there.
There may be leading or trailing whitespace in this line. There may be any number of intervening spaces, but there will always be at least 1, between the NUMBER: text literal and the digits which follow. There may or may not be a colon after NUMBER. The number of digits may vary; there are always at least 5, and there may be as many as 12 digits.
Actually, this here is the main part of the job - specifying what you want. From here on it's just translating into preg language.

Astions' example is right, except a minor matter:

Code: Select all

\s{0,5}CASE\s{0,5}NUMBER:{0,1}\s{1,5}[0-9]{5,12}
should be

Code: Select all

\s{0,}CASE\s{1,}NUMBER:{0,1}\s{1,5}[0-9]{5,12}
Post Reply