Pretty URL matching (omitting particular urls)

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
Glycerine
Forum Commoner
Posts: 39
Joined: Wed May 07, 2008 2:11 pm

Pretty URL matching (omitting particular urls)

Post by Glycerine »

I have a mod rewite of which works:

Code: Select all

^.*?(apple.+)$
will match

http://www.strangemother.com/pages/apple/food

great - but then it also matches

http://www.strangemother.com/images/apple

You know a way to not match if the url has 'images' within it - I kinda know its a negative lookbehind - but from then on, I struggle...
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Pretty URL matching (omitting particular urls)

Post by prometheuzz »

No, not negative look behind. Most PCRE engines will not allow "variable length look behinds", which means that you can't say: "look behind zero or more character and see if the word 'images' is there".
But, of course, you can use look ahead for this just as well:

Code: Select all

^(?=.*apple)(?!.*images).*$
which will match any string (without new line characters) containing the word "apple" (positive look ahead) and NOT containing the word "images" (negative look ahead).
Glycerine
Forum Commoner
Posts: 39
Joined: Wed May 07, 2008 2:11 pm

Re: Pretty URL matching (omitting particular urls)

Post by Glycerine »

I dunno how you ever got your head around this stuff...
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Pretty URL matching (omitting particular urls)

Post by prometheuzz »

Glycerine wrote:I dunno how you ever got your head around this stuff...
If you break it down, it ain't that hard*:

Code: Select all

^           // start of the string
(?=         // start positive look ahead
  .*        //   zero or more characters other than a new line
  apple     //   match the string 'apple'
)           // end positive look ahead
(?!         // start negative look ahead
  .*        //   zero or more characters other than a new line
  images    //   match the string 'images'
)           // end negative look ahead
.*          // zero or more characters other than a new line
$           // end of the string
* Perhaps I am not the best judge of when a regex is easy or not... ;)
Post Reply