Simple regex issue - need solving

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
sbi85
Forum Newbie
Posts: 3
Joined: Fri Sep 06, 2013 9:21 am

Simple regex issue - need solving

Post by sbi85 »

Hey everyone,

I am really new of using regular expressions. I am trying to set up rules in a web analytics system (Adobe Sitecatalyst).

So here is the dilemma:
I have 2 types of tracking string I am using for campaigns. They have different lenghts.

Type 1
email_promo_EN.promo.BTC__en_ENG_ENG_111222

Type 2
affiliate_landing_BTC_home_en_ENG_ENG

Now the thing I want to accomplish is to separate the parts of the the string that are separated by "_" symbols.

So ideally they will look like this:
Type 1
$1 - email
$2 - promo
$3 - EN.promo.BTC
$4 - *empty*
$5 - en
$6 - ENG
$7 - ENG
$8 - 111222

Type 2
$1 - affiliate
$2 - landing
$3 - BTC
$4 - home
$5 - en
$6 - ENG
$7 - ENG

Problem 1
Whatever I tried the system doesnt recognize the empty field in Type 1 (so two underscores "__" doesnt mean an empty field). I would be willing to insert a constant value but would be better if the original idea would work.

Problem 2
Since the second Type is shorter by one whatever regex I tried it wasnt working.

My ideas so far:

1. Alternation
a | b. So the string either contains 7 or 8 parts:

^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)$|^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)$

Not working.

2. having a non capturing group
(?: - idea was the last bit doesnt need to be captured so it wouldnt matter if its 7 or 8 parts.

^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)(?:\_([^\:]+))$

Not working.

Thats all I had.

Can you guys help me out with this one? Any advice would be greatly appreciated!!

Thanks

Balint
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Simple regex issue - need solving

Post by Christopher »

You can just use explode('_', $mystring) and then either remove or ignore empty elements created by double underscores. Or maybe explode('_', str_replace('__', '_', $mystring))
(#10850)
sbi85
Forum Newbie
Posts: 3
Joined: Fri Sep 06, 2013 9:21 am

Re: Simple regex issue - need solving

Post by sbi85 »

hey Christopher,

tried them both but doesnt seem to work. I think Sitecatalyst Regex "engine" is more simple than what you think.
Here is a page that explains the things that are working: http://microsite.omniture.com/t2/help/e ... tion_Rules

So far I haven't seen written expressions like "explode" anywhere. I think the options are more or less these ones: http://www.rexegg.com/regex-quickstart.html

maybe I just have some error in my solution, I dont know. but if you have any other ideas I am happy to listen :)
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Simple regex issue - need solving

Post by requinix »

So it's not for PHP then? Well, given that it supports a{3,6} syntax and \w \s metacharacters suggests that they support PCRE syntax at least. Which helps.

You were headed down the right track except for two things:
1. + means "one or more of", and since some/all of the fields are optional you should be using * ("zero or more of") for some/all of them instead.
2. #2 was right to use the ?: but missed out on the part of making it optional.
An improvement would also be making your character sets exclude underscores too. Otherwise the regex engine will take the first set and go all the way to the end of the string, then spend a very long time backtracing until it can match everything. Talking, like, exponentially longer to match than it needs to.

Code: Select all

([^:_]*)_([^:_]*)_([^:_]*)_([^:_]*)_([^:_]*)_([^:_]*)_([^:_]*)(?:_([^:_]*))?$
It's also possible the thing doesn't support that, like the non-capturing group or the ^ and $ anchors. If an expression doesn't work, try writing one to match a part of what you need, the continue adding to it until it doesn't match anymore. Then you'll know where the problem is and can work around it.
sbi85
Forum Newbie
Posts: 3
Joined: Fri Sep 06, 2013 9:21 am

Re: Simple regex issue - need solving

Post by sbi85 »

this is beautiful. working like a charm.

thanks a lot. :)
I guess thread can be closed
Post Reply