Page 1 of 1

Little question

Posted: Tue Aug 16, 2005 5:50 pm
by Sander
I was reading this tutorial, and at some point I came to this regex:

--
"/A[A-Z]*?B/". In English, this means "match an A, followed by only as many capital letters as are needed to find a B."
--

I don't really understand the part that I bolded. Why does it do this? Because of the '*?' after each other or so?

Posted: Tue Aug 16, 2005 6:43 pm
by nielsene
That seems a little odd to me. The '?' isn't even needed in this case I think.

'*' means match zero or more of the preceding item, ? says match zero or one..

So
/A[A-Z]*B/ is equal in matching terms to /A[A-Z]*?B/ I can't think of any example where it wouldn't be....

Posted: Tue Aug 16, 2005 6:44 pm
by feyd
I'll do one better and explain the whole thing.

for pattern:

Code: Select all

/A[A-Z]*?B/
/ has been selected as the pattern delimiter, this particular mark starts the pattern (with the following character.) This character can be any symbol. /, @, and # are often used mostly because they don't appear as characters to match in the patterns too often.
A simply match a capital a
[ this is a metacharacter. It's used as a character class starting mark. The contents following it are allowed in any order matching a single instance, unless other modified.
A-Z match any capital letter
] stops the character class.
* a match modifer. This particular metacharacter matches the previous object (character, character class, grouping) zero or more times unless modified by a ?
? a match modifer. When not after a * or + modifer, it will work against the previous object (character, character class, grouping) to find zero or one instance. When after a * or + it will tell the metacharacter to match the shortest possible set that satisfies the pattern. (Behaviour is reversed if the ungreedy pattern modifier is in effect.)
B match a capital b.
/ is now the ending of the pattern space. The next character(s) are entire pattern modifiers.

Putting it all in plain english: find a capital 'a' followed by any number of other capital letters to the closest capital 'b' anywhere in the string.

Posted: Tue Aug 16, 2005 8:19 pm
by Ambush Commander
By default, the regexp parser is "greedy", that is, it will try to take in as many tokens as it can before it has to stop. For instance:

Code: Select all

/[A-Z]+B/
when used on

Code: Select all

8fsajASDFASBIASDFKBKSB234e
will match

Code: Select all

ASDFASBIASDFKBKSB
which is the last possible B it can get, not

Code: Select all

ASDFASB
which is the first possible B it can get. When it's ungreedy, the regexp will return the latter.

Posted: Wed Aug 17, 2005 3:55 pm
by Sander
Thanks guy, I think I understand it now.
feyd wrote: ? a match modifer. When not after a * or + modifer, it will work against the previous object (character, character class, grouping) to find zero or one instance. When after a * or + it will tell the metacharacter to match the shortest possible set that satisfies the pattern. (Behaviour is reversed if the ungreedy pattern modifier is in effect.)
You couldn't have explained it better, thanks man :)

Just a little new question; how do you 'turn on' the ungreedy pattern modifier?

Posted: Wed Aug 17, 2005 3:59 pm
by Ambush Commander
It's a pattern modifier: http://us2.php.net/manual/en/reference. ... ifiers.php

/This is the pattern/U

Posted: Wed Aug 17, 2005 4:02 pm
by Sander
Ahh, I see. Thanks once again.

So, these 2 patterns do the same?

"/A[A-Z]*?B/"

"/A[A-Z]*B/U"

I think I'm starting to get the hang of this 8)

Posted: Wed Aug 17, 2005 4:04 pm
by Ambush Commander
Yup.

And...

"/A[A-Z]*?B/U"

is the same as

"/A[A-Z]*B/"