Why does the regex "a+|aabb" match the test string "aabb" as "aa" and not "aabb".
My understanding is
The regex "a+|aabb" means
1. "one or more a's"
or
2. the exact string "aabb"
So for the test string "aabb", condition #2 i.e. "aabb" is satisfied. Also it is the longest match.
Also if you change the regex to "aabb|a+" for the same test string "aabb" it matches "aabb" .
Could some one clear my doubt
Many thanks in advance.
A regex doubt
Moderator: General Moderators
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: A regex doubt
Good question.
In regex-land, there are roughly two different regex implementations:
1 - nondeterministic finite automaton (NFA);
2 - deterministic finite automaton (DFA).
An NFA will go through all possible matches in and will will eventually return, or match, the longest match it has encountered. A DFA will stop matching when it finds the first positive match it encounters. Also, PHP's regex engine evaluates from left to right: with the regex '/a|b|c/', first a check is made to see if there's an 'a', then if there's a 'b' and finally it checks to see if there's a 'c'.
You can probably guess which flavour the PHP preg-library is... indeed: it's a DFA.
A big disadvantage of an NFA is that it will take a very long time when matching large strings and certain regex-es. That's probably why most regex-flavours are DFA's (Java, Perl, JS, Python, etc.).
In regex-land, there are roughly two different regex implementations:
1 - nondeterministic finite automaton (NFA);
2 - deterministic finite automaton (DFA).
An NFA will go through all possible matches in and will will eventually return, or match, the longest match it has encountered. A DFA will stop matching when it finds the first positive match it encounters. Also, PHP's regex engine evaluates from left to right: with the regex '/a|b|c/', first a check is made to see if there's an 'a', then if there's a 'b' and finally it checks to see if there's a 'c'.
You can probably guess which flavour the PHP preg-library is... indeed: it's a DFA.
A big disadvantage of an NFA is that it will take a very long time when matching large strings and certain regex-es. That's probably why most regex-flavours are DFA's (Java, Perl, JS, Python, etc.).
Re: A regex doubt
Thank you prometheuzz for you reply.
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: A regex doubt
You're welcome.a0001_jay wrote:Thank you prometheuzz for you reply.