Page 1 of 1
filter help
Posted: Mon Jun 09, 2008 5:13 am
by netstorm
Ok, so I'm trying to create a filter and my problem is with word boundaries and escaping characters.
I have a $sentence i look through and a $word I'm trying to match. The $word can be anything, but it's not an expression so i escape regex operators: $word = addcslashes($word,".[]()^$/*+|"); Everything works fine untill now and the following example returns what's expected:
Code: Select all
$word="[xyz]*";
$sentence1="[xyz]*";
$sentence2="xyz";
$word = addcslashes($word,".[]()^$/*+|"); // $word becomes "/[xyz/]/*"
preg_match("/$word/i", $sentence1); //returns true
preg_match("/$word/i", $sentence2); //returns false
That's exactly what I want it to do, the problems start when i add word boundaries:
Code: Select all
$word="[xyz]*";
$sentence1="[xyz]*";
$sentence2="xyz";
$word = addcslashes($word,".[]()^$/*+|"); // $word becomes "/[xyz/]/*"
preg_match("/\b$word\b/i", $sentence1); //returns false <--problem!!
preg_match("/\b$word\b/i", $sentence2); //returns false
Can anyone please tell me what I'm doing wrong?

Re: filter help
Posted: Mon Jun 09, 2008 5:22 am
by prometheuzz
A word boundary (\b) does not match the start, or end of your string. So, try to use this instead:
Also, I presume that your "escaping method" returns "\[xyz\]\*" instead of "/[xyz/]/*". Did you know that the \Q will cause the regex to ignore all the meta characters so you don't need to escape them. So, try something like this:
HTH
Re: filter help
Posted: Mon Jun 09, 2008 5:39 am
by netstorm
Thanks for the fast reply
prometheuzz wrote:
I presume that your "escaping method" returns "\[xyz\]\*" instead of "/[xyz/]/*".
Yes, it does, It was a typo
prometheuzz wrote:
Did you know that the \Q will cause the regex to ignore all the meta characters so you don't need to escape them. So, try something like this:
HTH
I didn't know about \Q, but i tried what you said and it didn't work...
Code: Select all
preg_match('/(\b|^)\Q[xyz]*(\b|$)/', '[xyz]*') //returns false
Help again?

Re: filter help
Posted: Mon Jun 09, 2008 6:17 am
by prometheuzz
netstorm wrote:Thanks for the fast reply
...
I didn't know about \Q, but i tried what you said and it didn't work...
Code: Select all
preg_match('/(\b|^)\Q[xyz]*(\b|$)/', '[xyz]*') //returns false
Help again?

Sorry, I forgot to mention you need to end the \Q, otherwise the entire regex after the \Q will be matched "as is", so this is
really what I meant:
Code: Select all
preg_match('/(\b|^)\Q[xyz]*\E(\b|$)/', '[xyz]*')
I have no PHP interpreter at my disposal at the moment, but it should be ok.
HTH
Re: filter help
Posted: Mon Jun 09, 2008 6:49 am
by netstorm
Thanks! Well, it half works
Code: Select all
preg_match('/(\b|^)\Q[xyz]*\E(\b|$)/', '[xyz]*'); //returns true
preg_match('/(\b|^)\Q[xyz]*\E(\b|$)/', 'smtg [xyz]* smtg'); //returns false
Trying to fix it myself, I noticed that it matches 'b[xyz]*b' so the (\b|$) somehow gives the character 'b' and not the operator '\b' for word boundary. Any ideas?
Code: Select all
preg_match('/(\b|^)\Q[xyz]*\E(\b|$)/', 'b[xyz]*b'); //returns true
Re: filter help
Posted: Mon Jun 09, 2008 7:11 am
by prometheuzz
When testing the following:
Code: Select all
preg_match('/\b\Q[xyz]*\E\b/i', 'smtg [xyz]* sm[xyz]*tg', $result)
on
http://regex.larsolavtorvik.com/
$result is printed as:
Re: filter help
Posted: Mon Jun 09, 2008 7:15 am
by netstorm
Yeah, it matches the second occurence, not the first one

. So instead of getting me the separate word, it gets it only if it's inside the word

.
Re: filter help
Posted: Mon Jun 09, 2008 7:16 am
by prometheuzz
w.r.t. my previous reply:
Now all of a sudden, the tool I posted in my previous reply gives something different... I'll have a look at this when I get home and can actually test the stuff I post here.
Re: filter help
Posted: Mon Jun 09, 2008 7:39 am
by prometheuzz
netstorm wrote:Yeah, it matches the second occurence, not the first one

. So instead of getting me the separate word, it gets it only if it's inside the word

.
Wait, it's because of the fact the the '[', ']' and '*' are word boundaries themselves. You could solve it using look around.
Both:
Code: Select all
preg_match('/(?<=^|[\s])\Q[xyz]*\E(?=[\s]|$)/i', '[xyz]*', $result);
# and
preg_match('/(?<=^|[\s])\Q[xyz]*\E(?=[\s]|$)/i', 'aaa [xyz]* aaa', $result);
should evaluate to true and match "
[xyz]*".
Of course, you could expand the character class [\s] by adding punctuation marks to it.
Re: filter help
Posted: Mon Jun 09, 2008 9:01 am
by netstorm
They do and thank you very much for all your help!

Re: filter help
Posted: Mon Jun 09, 2008 9:09 am
by prometheuzz
netstorm wrote:They do and thank you very much for all your help!

Good to hear it, and you're welcome!
Re: filter help
Posted: Tue Jun 10, 2008 12:17 am
by GeertDD
prometheuzz wrote:A word boundary (\b) does not match the start, or end of your string.
It does. Well, it does when there is a word boundary.
Code: Select all
preg_match('~\bhello~', 'hello'); // returns (int)1
Re: filter help
Posted: Tue Jun 10, 2008 2:53 am
by prometheuzz
GeertDD wrote:prometheuzz wrote:A word boundary (\b) does not match the start, or end of your string.
It does. Well, it does when there is a word boundary.
Code: Select all
preg_match('~\bhello~', 'hello'); // returns (int)1
You're right of course, I was a bit confused because of the fact that the string to match had word boundaries in it. Good to have it on the record!
Well Geert, that's what happens if you leave me alone in here too long!
; )