Page 1 of 1
Regular Expressions question
Posted: Mon Dec 20, 2004 9:42 am
by visionmaster
Hello,
I always get 'A match was found.' What am I doing wrong?
I really just want to match distinct words. I should actually get a 'A match was NOT found.', but I don't.
"Meller Str. 33 · 49082 Osnabrueck"
->should match
"Meller Str. 33 · 49082 Osnabrueck"
->should match
"Meller Str. 33 · 490821 Osnabrueck"
->should not match
Probably a small mistake with a big impact...
Code: Select all
$arrDaten['Plz'] = '49082';
$arrDaten['Ort'] = 'Osnabrueck';
/* The \b in the pattern indicates a word boundary, so only the distinct
* word "web" is matched, and not a word partial like "webbing" or "cobweb" */
$pattern = "'|\b".preg_quote($arrDaten['Plz'])."\b\s+\b".preg_quote($arrDaten['Ort'])."\b|i'";
$value = "Meller Str. 33 · 490821 Osnabrueck";
echo $pattern;
if (preg_match( $pattern, $value))
{
echo "A match was found.";
}
else
{
echo "A match was NOT found.";
}
Posted: Mon Dec 20, 2004 2:44 pm
by rehfeld
i dont know why your using those single quotes next to the delimiter in your pattern, but try it without them
Code: Select all
$pattern = "|\b".preg_quote($arrDaten['Plz'])."\b\s+\b".preg_quote($arrDaten['Ort'])."\b|i";
i think your trying to use the pipe char as your delimiter, but the regex is using your single quote because it comes first
so the pipe char becomes a branch operator. since the only thing before the first branch is nothing, its might be matching "nothing". since regex are eager to match, its first way to complete the match is to match nothing, which it does and then doesnt even try to match the other branches
just my theory though

Posted: Tue Dec 21, 2004 2:17 am
by visionmaster
Hi rehfeld,
i think your trying to use the pipe char as your delimiter, but the regex is using your single quote because it comes first
so the pipe char becomes a branch operator. since the only thing before the first branch is nothing, its might be matching "nothing". since regex are eager to match, its first way to complete the match is to match nothing, which it does and then doesnt even try to match the other branches
just my theory though

Thanks for your hint and explanation, your absolutely right! This is how it works:
$pattern = "|\b".preg_quote($arrDaten['Plz'])."\b\s+\b".preg_quote($arrDaten['Ort'])."\b|i";
Posted: Thu Jun 23, 2005 3:31 am
by Heavy
When writing regular expressions, it is wise to try to use a pattern delimiter that is not a special token, and doesn't appear in the pattern.
You use | as the delimiter. | is a special regexp token, which may lead to confusion when reading the pattern later.
Furthermore it is extremely common to use / as the delimiter. (Probably because that seems to be very common over at Perl. Don't know if it's the only one they use.) But I would encourage anyone NOT to use / as delimiter when writing regexp for XML, URL or unix-path stuff, since all these things often include / in the string we want to match.
Consider three cases:
Code: Select all
Match tag or end tag:
/<\/?їa-z]+ї^>]*?\/?>/i
Match climbing path:
/\/ї^/]*\/\.\.\//
Match їprotocol]://:
/їa-z]+:\/\//i
Wouldn't it be a lot nicer to use some other delimiter that doesn't appear in the pattern?
Code: Select all
Match tag or end tag:
%</?їa-z]+ї^>]*?/?>%i
Match climbing path:
%/ї^/]*/\.\./%
Match їprotocol]://:
%їa-z]+://%i
I am not sure those patterns of mine actually work as intended, but my point is readability.
Posted: Thu Jun 23, 2005 10:20 pm
by Ambush Commander
Regexps are notoriously difficult to read.

Posted: Thu Jun 23, 2005 10:43 pm
by Skara
There not so hard to read if you do what Heavy said.

Posted: Fri Jun 24, 2005 6:04 am
by Heavy
Hehe, thanks!
You know, regexp can
really bite you.
As soon as you get the hang of it, you start visionizing about changing all the editors and all the parsers in the world...
