Page 1 of 1
Removing a query parameter if it contains certain values
Posted: Mon Apr 20, 2009 8:25 am
by Odenwalder
Hi everybody,
I am currently trying to build a search/replace filter in Google Analytics, which removes certain query parameters of a URI.
I have a query parameter "t_action", which has a number of possible values. I am trying to remove "t_action" from all URIs, but only if it contains certain values ("index" and "play").
Example:
In this URI, the t_action parameter should be matched and removed:
/index.html?t_action=index
In this URI, t_action should NOT be matched:
/fun/fruit/apples.html?t_action=input
So far I've tried this expression:
^(.*?)[\?&]t_action=[(index)|(play)]
The problem is that this expression, for some reason, matches everything up to the "i" of "index", so it also removes the t_action parameter if it has the value "input".
However, removing "play" seems to work.
I think this should be rather simple, but I can't seem to figure it out.
Does anyone see where I made a mistake here?
Re: Removing a query parameter if it contains certain values
Posted: Mon Apr 20, 2009 3:21 pm
by jazz090
this should better "/&?t_action=(index|play){1}&?/" the square brackets should only be used to define ranges.
Re: Removing a query parameter if it contains certain values
Posted: Tue Apr 21, 2009 1:04 am
by prometheuzz
jazz090 wrote:... the square brackets should only be used to define ranges.
@OP:
More specifically, the square brackets (called a character set, or character class)
can be used to define ranges but more importantly, they only match a single character. Also, all "normal" regex meta character loose their special meaning in them.
So this part of your regex: "[(index)|(play)]" will only match
one of the following (single!) characters:
(,
),
|,
i,
n,
d,
e,
x,
p,
l,
a or
y.
Re: Removing a query parameter if it contains certain values
Posted: Tue Apr 21, 2009 2:27 am
by Odenwalder
Hi and thanks for your response,
I am not quite sure which script language Google Analytics is using, but the documentation on filters says that with the square brackets [ ], you can group elements, the | sign serves as a logical OR, and the brackets ( ) are used to define their content as a character string, not as single characters.
I thought this should work to match either "t_action=index" or "t_action=play".
The complicated version of my expression would be
(.*?)[\?&]t_action=index|(.*?)[\?&]t_action=play
which does work the way I want it, but this doesn't seem useful if you are trying to match a larger set of values.
Re: Removing a query parameter if it contains certain values
Posted: Tue Apr 21, 2009 2:51 am
by prometheuzz
Odenwalder wrote:Hi and thanks for your response,
I am not quite sure which script language Google Analytics is using, but the documentation on filters says that with the square brackets [ ], you can group elements, ...
This Google-Analytics-help page says otherwise:
http://www.google.com/support/googleana ... swer=91152
In that page they talk about a "character list", which IMO is a confusing name to say because in regex-terms this is always called a "set" or "class". But they all do the same thing: they match a single character.
Code: Select all
foo[abc] // matches 'foo' followed by an 'a', 'b' or 'c'
foo[a-c] // the same as the one above
Re: Removing a query parameter if it contains certain values
Posted: Tue Apr 21, 2009 3:55 am
by Odenwalder
That seems logical. The problem, in this case, is that I would have to group "index" and "play" somehow. Otherwise it would only match either "t_action=index" or "play".
What would you suggest, how to separate certain defined character strings using an OR like above, without having to repeat the "t_action="?
Re: Removing a query parameter if it contains certain values
Posted: Tue Apr 21, 2009 3:57 am
by prometheuzz
Odenwalder wrote:That seems logical. The problem, in this case, is that I would have to group "index" and "play" somehow. Otherwise it would only match either "t_action=index" or "play".
What would you suggest, how to separate certain defined character strings using an OR like above, without having to repeat the "t_action="?
Like this:
which matches either "t_action=index" or "t_action=play" and stores "index" or "play" in match group 1.
This is what jazz090 already suggested, only I left out the "{1}" part which is redundant.
Re: Removing a query parameter if it contains certain values
Posted: Tue Apr 21, 2009 4:46 am
by Odenwalder
Now I see. This one did the trick
(.*?)[\?&]t_action=(index|play)
I just messed up with the positioning of the brackets. And the [ ] really only match each single character inside them.
Thanks for the advice everyone! And sorry for my slow brain.
Re: Removing a query parameter if it contains certain values
Posted: Tue Apr 21, 2009 4:52 am
by prometheuzz
Odenwalder wrote:Now I see. This one did the trick
(.*?)[\?&]t_action=(index|play)
I just messed up with the positioning of the brackets. And the [ ] really only match each single character inside them.
Thanks for the advice everyone! And sorry for my slow brain.
No problem. Note that inside a character class, the '?' does not have a special meaning, so:
could be written as
But escaping it will also work, so if you find it easier to read with the backslash, you can leave it there.