Simple regex issue - need solving
Posted: Fri Sep 06, 2013 10:02 am
Hey everyone,
I am really new of using regular expressions. I am trying to set up rules in a web analytics system (Adobe Sitecatalyst).
So here is the dilemma:
I have 2 types of tracking string I am using for campaigns. They have different lenghts.
Type 1
email_promo_EN.promo.BTC__en_ENG_ENG_111222
Type 2
affiliate_landing_BTC_home_en_ENG_ENG
Now the thing I want to accomplish is to separate the parts of the the string that are separated by "_" symbols.
So ideally they will look like this:
Type 1
$1 - email
$2 - promo
$3 - EN.promo.BTC
$4 - *empty*
$5 - en
$6 - ENG
$7 - ENG
$8 - 111222
Type 2
$1 - affiliate
$2 - landing
$3 - BTC
$4 - home
$5 - en
$6 - ENG
$7 - ENG
Problem 1
Whatever I tried the system doesnt recognize the empty field in Type 1 (so two underscores "__" doesnt mean an empty field). I would be willing to insert a constant value but would be better if the original idea would work.
Problem 2
Since the second Type is shorter by one whatever regex I tried it wasnt working.
My ideas so far:
1. Alternation
a | b. So the string either contains 7 or 8 parts:
^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)$|^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)$
Not working.
2. having a non capturing group
(?: - idea was the last bit doesnt need to be captured so it wouldnt matter if its 7 or 8 parts.
^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)(?:\_([^\:]+))$
Not working.
Thats all I had.
Can you guys help me out with this one? Any advice would be greatly appreciated!!
Thanks
Balint
I am really new of using regular expressions. I am trying to set up rules in a web analytics system (Adobe Sitecatalyst).
So here is the dilemma:
I have 2 types of tracking string I am using for campaigns. They have different lenghts.
Type 1
email_promo_EN.promo.BTC__en_ENG_ENG_111222
Type 2
affiliate_landing_BTC_home_en_ENG_ENG
Now the thing I want to accomplish is to separate the parts of the the string that are separated by "_" symbols.
So ideally they will look like this:
Type 1
$1 - email
$2 - promo
$3 - EN.promo.BTC
$4 - *empty*
$5 - en
$6 - ENG
$7 - ENG
$8 - 111222
Type 2
$1 - affiliate
$2 - landing
$3 - BTC
$4 - home
$5 - en
$6 - ENG
$7 - ENG
Problem 1
Whatever I tried the system doesnt recognize the empty field in Type 1 (so two underscores "__" doesnt mean an empty field). I would be willing to insert a constant value but would be better if the original idea would work.
Problem 2
Since the second Type is shorter by one whatever regex I tried it wasnt working.
My ideas so far:
1. Alternation
a | b. So the string either contains 7 or 8 parts:
^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)$|^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)$
Not working.
2. having a non capturing group
(?: - idea was the last bit doesnt need to be captured so it wouldnt matter if its 7 or 8 parts.
^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)(?:\_([^\:]+))$
Not working.
Thats all I had.
Can you guys help me out with this one? Any advice would be greatly appreciated!!
Thanks
Balint