Little question

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
Sander
Forum Commoner
Posts: 38
Joined: Sat Aug 06, 2005 12:43 pm

Little question

Post by Sander »

I was reading this tutorial, and at some point I came to this regex:

--
"/A[A-Z]*?B/". In English, this means "match an A, followed by only as many capital letters as are needed to find a B."
--

I don't really understand the part that I bolded. Why does it do this? Because of the '*?' after each other or so?
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

That seems a little odd to me. The '?' isn't even needed in this case I think.

'*' means match zero or more of the preceding item, ? says match zero or one..

So
/A[A-Z]*B/ is equal in matching terms to /A[A-Z]*?B/ I can't think of any example where it wouldn't be....
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I'll do one better and explain the whole thing.

for pattern:

Code: Select all

/A[A-Z]*?B/
/ has been selected as the pattern delimiter, this particular mark starts the pattern (with the following character.) This character can be any symbol. /, @, and # are often used mostly because they don't appear as characters to match in the patterns too often.
A simply match a capital a
[ this is a metacharacter. It's used as a character class starting mark. The contents following it are allowed in any order matching a single instance, unless other modified.
A-Z match any capital letter
] stops the character class.
* a match modifer. This particular metacharacter matches the previous object (character, character class, grouping) zero or more times unless modified by a ?
? a match modifer. When not after a * or + modifer, it will work against the previous object (character, character class, grouping) to find zero or one instance. When after a * or + it will tell the metacharacter to match the shortest possible set that satisfies the pattern. (Behaviour is reversed if the ungreedy pattern modifier is in effect.)
B match a capital b.
/ is now the ending of the pattern space. The next character(s) are entire pattern modifiers.

Putting it all in plain english: find a capital 'a' followed by any number of other capital letters to the closest capital 'b' anywhere in the string.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

By default, the regexp parser is "greedy", that is, it will try to take in as many tokens as it can before it has to stop. For instance:

Code: Select all

/[A-Z]+B/
when used on

Code: Select all

8fsajASDFASBIASDFKBKSB234e
will match

Code: Select all

ASDFASBIASDFKBKSB
which is the last possible B it can get, not

Code: Select all

ASDFASB
which is the first possible B it can get. When it's ungreedy, the regexp will return the latter.
Sander
Forum Commoner
Posts: 38
Joined: Sat Aug 06, 2005 12:43 pm

Post by Sander »

Thanks guy, I think I understand it now.
feyd wrote: ? a match modifer. When not after a * or + modifer, it will work against the previous object (character, character class, grouping) to find zero or one instance. When after a * or + it will tell the metacharacter to match the shortest possible set that satisfies the pattern. (Behaviour is reversed if the ungreedy pattern modifier is in effect.)
You couldn't have explained it better, thanks man :)

Just a little new question; how do you 'turn on' the ungreedy pattern modifier?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

It's a pattern modifier: http://us2.php.net/manual/en/reference. ... ifiers.php

/This is the pattern/U
Sander
Forum Commoner
Posts: 38
Joined: Sat Aug 06, 2005 12:43 pm

Post by Sander »

Ahh, I see. Thanks once again.

So, these 2 patterns do the same?

"/A[A-Z]*?B/"

"/A[A-Z]*B/U"

I think I'm starting to get the hang of this 8)
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Yup.

And...

"/A[A-Z]*?B/U"

is the same as

"/A[A-Z]*B/"
Post Reply