Page 1 of 1

Removing a query parameter if it contains certain values

Posted: Mon Apr 20, 2009 8:25 am
by Odenwalder
Hi everybody,

I am currently trying to build a search/replace filter in Google Analytics, which removes certain query parameters of a URI.
I have a query parameter "t_action", which has a number of possible values. I am trying to remove "t_action" from all URIs, but only if it contains certain values ("index" and "play").

Example:
In this URI, the t_action parameter should be matched and removed:
/index.html?t_action=index
In this URI, t_action should NOT be matched:
/fun/fruit/apples.html?t_action=input

So far I've tried this expression:
^(.*?)[\?&]t_action=[(index)|(play)]

The problem is that this expression, for some reason, matches everything up to the "i" of "index", so it also removes the t_action parameter if it has the value "input".
However, removing "play" seems to work.

I think this should be rather simple, but I can't seem to figure it out.
Does anyone see where I made a mistake here?

Re: Removing a query parameter if it contains certain values

Posted: Mon Apr 20, 2009 3:21 pm
by jazz090
this should better "/&?t_action=(index|play){1}&?/" the square brackets should only be used to define ranges.

Re: Removing a query parameter if it contains certain values

Posted: Tue Apr 21, 2009 1:04 am
by prometheuzz
jazz090 wrote:... the square brackets should only be used to define ranges.
@OP:

More specifically, the square brackets (called a character set, or character class) can be used to define ranges but more importantly, they only match a single character. Also, all "normal" regex meta character loose their special meaning in them.

So this part of your regex: "[(index)|(play)]" will only match one of the following (single!) characters: (, ), |, i, n, d, e, x, p, l, a or y.

Re: Removing a query parameter if it contains certain values

Posted: Tue Apr 21, 2009 2:27 am
by Odenwalder
Hi and thanks for your response,

I am not quite sure which script language Google Analytics is using, but the documentation on filters says that with the square brackets [ ], you can group elements, the | sign serves as a logical OR, and the brackets ( ) are used to define their content as a character string, not as single characters.
I thought this should work to match either "t_action=index" or "t_action=play".

The complicated version of my expression would be

(.*?)[\?&]t_action=index|(.*?)[\?&]t_action=play

which does work the way I want it, but this doesn't seem useful if you are trying to match a larger set of values.

Re: Removing a query parameter if it contains certain values

Posted: Tue Apr 21, 2009 2:51 am
by prometheuzz
Odenwalder wrote:Hi and thanks for your response,

I am not quite sure which script language Google Analytics is using, but the documentation on filters says that with the square brackets [ ], you can group elements, ...
This Google-Analytics-help page says otherwise:

http://www.google.com/support/googleana ... swer=91152

In that page they talk about a "character list", which IMO is a confusing name to say because in regex-terms this is always called a "set" or "class". But they all do the same thing: they match a single character.

Code: Select all

foo[abc] // matches 'foo' followed by an 'a', 'b' or 'c'
foo[a-c] // the same as the one above

Re: Removing a query parameter if it contains certain values

Posted: Tue Apr 21, 2009 3:55 am
by Odenwalder
That seems logical. The problem, in this case, is that I would have to group "index" and "play" somehow. Otherwise it would only match either "t_action=index" or "play".
What would you suggest, how to separate certain defined character strings using an OR like above, without having to repeat the "t_action="?

Re: Removing a query parameter if it contains certain values

Posted: Tue Apr 21, 2009 3:57 am
by prometheuzz
Odenwalder wrote:That seems logical. The problem, in this case, is that I would have to group "index" and "play" somehow. Otherwise it would only match either "t_action=index" or "play".
What would you suggest, how to separate certain defined character strings using an OR like above, without having to repeat the "t_action="?
Like this:

Code: Select all

t_action=(index|play)
which matches either "t_action=index" or "t_action=play" and stores "index" or "play" in match group 1.
This is what jazz090 already suggested, only I left out the "{1}" part which is redundant.

Re: Removing a query parameter if it contains certain values

Posted: Tue Apr 21, 2009 4:46 am
by Odenwalder
Now I see. This one did the trick

(.*?)[\?&]t_action=(index|play)

I just messed up with the positioning of the brackets. And the [ ] really only match each single character inside them.

Thanks for the advice everyone! And sorry for my slow brain.

Re: Removing a query parameter if it contains certain values

Posted: Tue Apr 21, 2009 4:52 am
by prometheuzz
Odenwalder wrote:Now I see. This one did the trick

(.*?)[\?&]t_action=(index|play)

I just messed up with the positioning of the brackets. And the [ ] really only match each single character inside them.

Thanks for the advice everyone! And sorry for my slow brain.
No problem. Note that inside a character class, the '?' does not have a special meaning, so:

Code: Select all

[\?&]
could be written as

Code: Select all

[?&]
But escaping it will also work, so if you find it easier to read with the backslash, you can leave it there.