Make it stop!

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
is_blank
Forum Commoner
Posts: 36
Joined: Sat Jun 25, 2005 6:05 pm
Location: Tennessee, USA

Make it stop!

Post by is_blank »

An elementary question again--I've got text in this format:
Firstname Lastname (born Month, day, Year) was blah blah blah. Blah Blah blah blah, etc. etc.
I'd like to cut out the (born M D, Y), which may appear in any different format: (b. M D, Y), (b. M D, Y - d. M D, Y), etc. I was matching with a simple pattern

Code: Select all

/\(.*\)/
the text is never that long...they're just little biographical blurbs. I've run into this, though:
Firstname Lastname (born Month, day, Year) was blah blah blah. Blah Blah blah (Blah BLAH blah!) blah blah, etc. etc.
so now the above pattern matches (born Month, day, Year) was blah blah blah. Blah Blah blah (Blah BLAH blah!) instead of just the first bit.

What's the trick to getting the match to stop after the first ")", instead of going on?

Thanks!
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

It's the evil greedy dot star :P

We all pull hair out over this one at some point lol.

It's greedy, i.e. the dot matches anything until theres nothing to match. Stop it being greedy by combining the star * with a "?".

Code: Select all

/\(.*?\)/
That same principle works with any of the quantifiers, i.e. "+", "*" and "{n,m}" ;)
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Post by Burrito »

try using {1} after your pattern you should.
User avatar
is_blank
Forum Commoner
Posts: 36
Joined: Sat Jun 25, 2005 6:05 pm
Location: Tennessee, USA

Post by is_blank »

Cool. I think I get it. Thanks!
(Er, Cool it is? Get it, I do? :D)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Hmmm how do I explain greediness simply? To be honest, it confuses everyone ;)

OK lets personify the regex pattern a bit and off we go....

Running against the string Hello World!.

Scenario 1 - /(.*)\s\w+!/
Hello, I'm a regex pattern and I like to eat anything you tell me. Let me introduce my fellow (meta)characters.

.* Meet dotstar: He can eat anything he likes any number of times
\s Meet whitespace: She just eats the things that don't use ink when printing
\w Meet wordchar: He can eat any letter, number or underscore.

OK I'm a hungry regex pattern so I'm gonna start eating this string now.

Off you go dotstar...
dotstar: Oh, a letter H I can eat that, and "e" I'll have that too, and "l", and "l" and "o" and this space - well I can eat ANYTHING ANY number of times so I'll have that too, and this lovely "W" and the "o" and the "r", then this "l" and the "d" and the exclamation mark I can eat that too. Doh! I've eaten it all :( I'm still hungry too, I could have just eaten and eaten and eaten til there was nothing left.

Your turn whitespace...

whitespace: :cry: Hmmm there's nothing for me to eat, that greedy piggy dotstar went and eat everything again!

The same goes for wordchar too.
Scenario 2 - /(.*?)\s\w+!/
Off you go dotstar...

dotstar: Hey! Someone stuck this question mark on me, huh... that's not fair, I have to let someone else eat today. Well I'll see what I'm allowed anyway.
Yummy, a "H", oh hold on a minute, who's next in the queue? Oh that's OK, it's whitespace and she can't eat this anyway so I'll have it. Look, I can eat this "e" too cos whitespace can only eat things that dont use ink when printing. I'll have the "l", and the next "l", and the "o". Oh, what's this? A space.... ermm, whitespace, you hungry?

...
whitespace: Yes! Oh and look, there's a space for me too eat! Yummy. What's next? :cry: Just a "W", I can't eat that, it give me a really bad stomach ache. wordchar, do you want it?
...
wordchar: Yippee.... It's my go at last!. OK I'll eat this "W" and the "o" and the "r" and the "l" and the "d" but I can't eat this exclamation mark... I'm not allowed so I'm told.

The "!" in the pattern matches the "!" in the string and it's done
Basically... stick a question mark on a quantifier and you tell that bit of the pattern to check if the following metacharacter (or part of the pattern) is able to match the string before going ahead and matching it anyway.
User avatar
Burrito
Spockulator
Posts: 4715
Joined: Wed Feb 04, 2004 8:15 pm
Location: Eden, Utah

Post by Burrito »

d11 was a children's book writer in his past life...
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

and is feeding random quotes into a database to get it to write children books now.
Post Reply