Page 1 of 1
Reg Expression not working fully...
Posted: Tue May 18, 2004 2:33 pm
by binjured
ok here's my function to make
http://www.yada.com to a link on the page...
Code: Select all
function make_links($string) {
$match = "[a-zA-Z]{3,}://[a-zA-Z0-9\-\.]+/*[a-zA-Z0-9/\\%_.]*\?*[a-zA-Z0-9/\\%_.=&]*";
$replace = '<a href="\\0" target=_blank>\\0</a>';
$string = eregi_replace($match,$replace,$string);
return $string;
}
this works fine except for where "-" is present where it stops reading at the "-". I can't figure out where to put "\-" in this expression to fix this cuz it seems wherever i put it after "[a-zA-Z0-9\-\.]" it doesnt work and the post doesn't show up. What am i missing?
Posted: Tue May 18, 2004 7:16 pm
by Weirdan
you don't need to escape dash (-) and dot (.) inside a character class. To make dash interpreted as a member of a character class make it last or first character in a class, e.g just after the opening square bracket or just before closing one. You don't need to escape % either.
Posted: Tue May 18, 2004 7:18 pm
by feyd
looks more like he's adding backslash as a valid character.
Posted: Tue May 18, 2004 7:28 pm
by Weirdan
hmmm.... I doubt he does, backslash is reserved character according to rfc2396, but does not appear as delimiter character in rfc2616. So, backslash should be urlencoded to be included in http URL.
Posted: Tue May 18, 2004 7:37 pm
by feyd
right.. however.. he has double backslashes. Last I checked, that adds a regular backslash to the string..
I'm not saying his intention was that.. but that it is doing that.
Posted: Wed May 19, 2004 2:32 am
by binjured
Yeah, it's an escaped backslash... not sure why I thought i'd need to check for a backslash in an url but hell.. can't hurt
And thanks, the - at the end fixed it all. Thanks a lot. I just started learning reg. expressions like... a day ago.
Posted: Wed May 19, 2004 12:51 pm
by Weirdan
feyd wrote:right.. however.. he has double backslashes. Last I checked, that adds a regular backslash to the string..
Sure, but then this string is passed to regexp function which uses backslash to escape its own special characters... so....

Posted: Wed May 19, 2004 1:23 pm
by feyd
where still, a double backslash creates a backslash character match..
Posted: Wed May 19, 2004 2:22 pm
by Weirdan
to not to be groundless:
Code: Select all
weirdan@home:~$ php -r 'echo (ereg("\"e;, ""e;)?"true":"false")."\n";';
true
weirdan@home:~$ php -r 'echo (ereg(""e;, ""e;)?"true":"false")."\n";';
PHP Warning: ereg(): REG_EESCAPE in Command line code on line 1
Warning: ereg(): REG_EESCAPE in Command line code on line 1
false
weirdan@home:~$
Huh? What do you think now? To create regexp which will match literal backslash you need to use
four backslashes.
Posted: Wed May 19, 2004 2:28 pm
by feyd
I stand corrected.
Posted: Wed May 19, 2004 3:01 pm
by redmonkey
In many cases when using regex a double backslash is all that is required, however PHP seems to handle things slightly different.
For anyone who happens to be interested, the reason you require four backslashes when trying to pass a literal backslash character is that PHP seems to parse the regex pattern first prior to envoking the regex engine. Therefore the pattern '\\\'' is actually passed to the regex engine as '\''.
Carrying on from Weirdan's example above, what makes life even more confusing is that....
Code: Select all
echo (preg_match('/\\\/', '\'')?"true":"false");
...also results in true even though we are only using three backslashes in the pattern.
Even though the example I give above is functionally identical to....
Code: Select all
echo (preg_match('/\\\\/', '\'')?"true":"false");
....I would recommend always using four backslashes when passing the literal backslash character as I think it is more technically correct.
Posted: Wed May 19, 2004 3:31 pm
by Weirdan
redmonkey wrote:In many cases when using regex a double backslash is all that is required, however PHP seems to handle things slightly different.
As many other languages do
redmonkey wrote:
For anyone who happens to be interested, the reason you require four backslashes when trying to pass a literal backslash character is that PHP seems to parse the regex pattern first prior to envoking the regex engine.
It does so because regexp pattern
is a php string (unlike Javascript, where regexes are one of internal datatypes). PHP uses backslash for its own purposes, e.g. escaping special characters in strings.
redmonkey wrote:
Therefore the pattern '\\\'' is actually passed to the regex engine as '\''.
Carrying on from Weirdan's example above, what makes life even more confusing is that....
Code: Select all
echo (preg_match('/\\\/', '\'')?"true":"false");
...also results in true even though we are only using three backslashes in the pattern.
Excellent example, I happenned to check it too.

It's unusual behaviour among the programming languages I ever used, PHP treats backslash before non-special character not as escape character, but as backslash itself!

Ugly, but it works so. It's documented feature:
http://php.net/manual/en/language.types ... tax.double
Let's have a look at pattern you used:
- First character is a forward slash. It's non-special character, so it parsed as is
- Second character is a backslash, followed by special character (backslash again
). So entire 2char sequence parsed as literal last-in-sequence character, giving us a literal backslash.
- (Here lies ugly dragon
) Forth character is a backslash, followed by non-special character!. Interpeted as a literal backslash
- Fifth character is a forward slash. Non-special character, interpreted as is
We've just seen Black PHP Magic in action

So, the pattern passed to preg_match is: /\\/
Obviously, this pattern matches literal backslash.
redmonkey wrote:
Even though the example I give above is functionally identical to....
Code: Select all
echo (preg_match('/\\\\/', '\'')?"true":"false");
....I would recommend always using four backslashes when passing the literal backslash character as I think it is more technically correct.
It isn't more correct in php world, but I consider it's good habit. Someday you could move to another language.......
Posted: Wed May 19, 2004 4:58 pm
by redmonkey
Weirdan,
Good explanation. I too consider it unusual behaviour so thought it worthy of note.
Weirdan wrote:
It isn't more correct in php world, but I consider it's good habit. Someday you could move to another language.......
Yes, perhaps my phrasing on that one was not the best.