What am I doing wrong...

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
andersod2
Forum Newbie
Posts: 5
Joined: Sat May 02, 2009 1:15 am

What am I doing wrong...

Post by andersod2 »

Am trying to do a perl regex:

string is "row:1:2"

regex is /^(.+):.+$/

but $1 is "row:1"

rather than "row" as I would expect....am I thinking wrong here? I would like to be able to make the regex get just that first string before the first colon...is this a greedy vs lazy thing?

??
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: What am I doing wrong...

Post by prometheuzz »

andersod2 wrote:... is this a greedy vs lazy thing?

??
Correct, although it's not referred to as "lazy matching" but as "reluctant matching" or "non-greedy matching".

The regex "(.+):.+" could be read as follows: match as many characters as you can, and then backtrack to a colon and again, match as many characters as you can. In other words, the first ".+" matches the entire string (hence, the word greedy) then the regex engine is forced to backtrack to the first colon it finds (backtracking to the first colon == the last colon). The last ".+" then consumes the rest after that last colon.

So, to make it reluctant (ie. non-greedy), add a question mark after the greedy + operator:

Code: Select all

/^(.+?):.+$/
Or, a better (and safer and faster!) option would be to use a negated character class "[^:]+", which means: match one or more characters of any type except colons. A regex would look like this:

Code: Select all

/^([^:]+):.+$/
HTH
andersod2
Forum Newbie
Posts: 5
Joined: Sat May 02, 2009 1:15 am

Re: What am I doing wrong...

Post by andersod2 »

Thank you prometheuzz! That was an awesome answer and exactly what I was looking for. I managed to figure this out by trial and error last night, but your faster/safer version is much better.

Question: why is the second version safer?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: What am I doing wrong...

Post by prometheuzz »

andersod2 wrote:Thank you prometheuzz! That was an awesome answer and exactly what I was looking for. I managed to figure this out by trial and error last night, but your faster/safer version is much better.
You're welcome.
andersod2 wrote:Question: why is the second version safer?
Not "safer" as in better security. I call it safer because the greedy DOT-STAR and DOT-PLUS operators match practically any character. So you have little control over what is matched. If you know up front you don't want to match a colon, I call it "safer" to use a negated character class instead of a DOT-STAR or DOT-PLUS.

IMO, it is generally a good idea to avoid greedy DOT-STARs or DOT-PLUSses in your regex because you have little control over them and they frequently result in a decrease of performance because of the backtracking that is going on (this last only applies when matching larger chunks of text though). Of course, there are cases when this greedy behaviour is desired.
Post Reply