Page 1 of 1

Extract text?

Posted: Tue Sep 26, 2006 4:18 pm
by jemmer
Hi,

I am totally incompetent when comes to regular expressions, so I come asking for help. I'm sure this is easy, but I have no idea how to do this.

I need some help constructing an expression to parse out some particular text from a text file.

I know I could do this the "hard way" by scanning the input strings looking for the particular characters I'm interested in, but there is enough potential variations in the text that I think a regular expression is the best solution to my problem.

The text will look something like:

CASE NUMBER: 123456789

There may be leading or trailing whitespace in this line. There may be any number of intervening spaces, but there will always be at least 1, between the NUMBER: text literal and the digits which follow. There may or may not be a colon after NUMBER. The number of digits may vary; there are always at least 5, and there may be as many as 12 digits.

I need to extract those digits into a named group, at least I think I need to - I need the case number for subsequent processing, and a named group seems the best way to handle that, but what do I know? Once I find the case number in the text, I'm done - there is no more need to parse anything else.

That's it. Simple really, but I've wasted a lot of time fumbling around so I thought I'd ask for some help,

Thanks,

- Jeff

Posted: Tue Sep 26, 2006 4:28 pm
by Benjamin
Be sure to use the case insensitive flag.

Code: Select all

\s{0,5}CASE\s{0,5}NUMBER:{0,1}\s{1,5}[0-9]{5,12}
To get the case number add ()

Code: Select all

\s{0,5}CASE\s{0,5}NUMBER:{0,1}\s{1,5}([0-9]){5,12}

Posted: Wed Sep 27, 2006 9:25 am
by Mordred
but there is enough potential variations in the text that I think a regular expression is the best solution to my problem.
Quite right, excellent thinking there.
There may be leading or trailing whitespace in this line. There may be any number of intervening spaces, but there will always be at least 1, between the NUMBER: text literal and the digits which follow. There may or may not be a colon after NUMBER. The number of digits may vary; there are always at least 5, and there may be as many as 12 digits.
Actually, this here is the main part of the job - specifying what you want. From here on it's just translating into preg language.

Astions' example is right, except a minor matter:

Code: Select all

\s{0,5}CASE\s{0,5}NUMBER:{0,1}\s{1,5}[0-9]{5,12}
should be

Code: Select all

\s{0,}CASE\s{1,}NUMBER:{0,1}\s{1,5}[0-9]{5,12}