Reg Expression not working fully...

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
binjured
Forum Newbie
Posts: 7
Joined: Fri May 14, 2004 6:07 pm

Reg Expression not working fully...

Post by binjured »

ok here's my function to make http://www.yada.com to a link on the page...

Code: Select all

function make_links($string) {
$match = "[a-zA-Z]{3,}://[a-zA-Z0-9\-\.]+/*[a-zA-Z0-9/\\%_.]*\?*[a-zA-Z0-9/\\%_.=&]*";
$replace = '<a href="\\0" target=_blank>\\0</a>';
$string = eregi_replace($match,$replace,$string);
return $string;
}
this works fine except for where "-" is present where it stops reading at the "-". I can't figure out where to put "\-" in this expression to fix this cuz it seems wherever i put it after "[a-zA-Z0-9\-\.]" it doesnt work and the post doesn't show up. What am i missing?
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

you don't need to escape dash (-) and dot (.) inside a character class. To make dash interpreted as a member of a character class make it last or first character in a class, e.g just after the opening square bracket or just before closing one. You don't need to escape % either.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

looks more like he's adding backslash as a valid character.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

hmmm.... I doubt he does, backslash is reserved character according to rfc2396, but does not appear as delimiter character in rfc2616. So, backslash should be urlencoded to be included in http URL.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

right.. however.. he has double backslashes. Last I checked, that adds a regular backslash to the string..

I'm not saying his intention was that.. but that it is doing that.
Last edited by feyd on Wed May 19, 2004 2:39 am, edited 1 time in total.
binjured
Forum Newbie
Posts: 7
Joined: Fri May 14, 2004 6:07 pm

Post by binjured »

Yeah, it's an escaped backslash... not sure why I thought i'd need to check for a backslash in an url but hell.. can't hurt :?

And thanks, the - at the end fixed it all. Thanks a lot. I just started learning reg. expressions like... a day ago.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

feyd wrote:right.. however.. he has double backslashes. Last I checked, that adds a regular backslash to the string..
Sure, but then this string is passed to regexp function which uses backslash to escape its own special characters... so.... ;)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

where still, a double backslash creates a backslash character match..
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

to not to be groundless:

Code: Select all

weirdan@home:~$ php -r 'echo (ereg("\&quote;, "&quote;)?"true":"false")."\n";';
true
weirdan@home:~$ php -r 'echo (ereg("&quote;, "&quote;)?"true":"false")."\n";';
PHP Warning:  ereg(): REG_EESCAPE in Command line code on line 1

Warning: ereg(): REG_EESCAPE in Command line code on line 1
false
weirdan@home:~$
Huh? What do you think now? To create regexp which will match literal backslash you need to use four backslashes.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I stand corrected.
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

In many cases when using regex a double backslash is all that is required, however PHP seems to handle things slightly different.

For anyone who happens to be interested, the reason you require four backslashes when trying to pass a literal backslash character is that PHP seems to parse the regex pattern first prior to envoking the regex engine. Therefore the pattern '\\\'' is actually passed to the regex engine as '\''.

Carrying on from Weirdan's example above, what makes life even more confusing is that....

Code: Select all

echo (preg_match('/\\\/', '\'')?"true":"false");
...also results in true even though we are only using three backslashes in the pattern.

Even though the example I give above is functionally identical to....

Code: Select all

echo (preg_match('/\\\\/', '\'')?"true":"false");
....I would recommend always using four backslashes when passing the literal backslash character as I think it is more technically correct.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

redmonkey wrote:In many cases when using regex a double backslash is all that is required, however PHP seems to handle things slightly different.
As many other languages do ;)
redmonkey wrote: For anyone who happens to be interested, the reason you require four backslashes when trying to pass a literal backslash character is that PHP seems to parse the regex pattern first prior to envoking the regex engine.
It does so because regexp pattern is a php string (unlike Javascript, where regexes are one of internal datatypes). PHP uses backslash for its own purposes, e.g. escaping special characters in strings.
redmonkey wrote: Therefore the pattern '\\\'' is actually passed to the regex engine as '\''.

Carrying on from Weirdan's example above, what makes life even more confusing is that....

Code: Select all

echo (preg_match('/\\\/', '\'')?"true":"false");
...also results in true even though we are only using three backslashes in the pattern.
Excellent example, I happenned to check it too. ;) It's unusual behaviour among the programming languages I ever used, PHP treats backslash before non-special character not as escape character, but as backslash itself! :evil: Ugly, but it works so. It's documented feature: http://php.net/manual/en/language.types ... tax.double
Let's have a look at pattern you used:
  • First character is a forward slash. It's non-special character, so it parsed as is
  • Second character is a backslash, followed by special character (backslash again ;) ). So entire 2char sequence parsed as literal last-in-sequence character, giving us a literal backslash.
  • (Here lies ugly dragon ;) ) Forth character is a backslash, followed by non-special character!. Interpeted as a literal backslash
  • Fifth character is a forward slash. Non-special character, interpreted as is
We've just seen Black PHP Magic in action ;)
So, the pattern passed to preg_match is: /\\/
Obviously, this pattern matches literal backslash.
redmonkey wrote: Even though the example I give above is functionally identical to....

Code: Select all

echo (preg_match('/\\\\/', '\'')?"true":"false");
....I would recommend always using four backslashes when passing the literal backslash character as I think it is more technically correct.
It isn't more correct in php world, but I consider it's good habit. Someday you could move to another language.......
redmonkey
Forum Regular
Posts: 836
Joined: Thu Dec 18, 2003 3:58 pm

Post by redmonkey »

Weirdan,
Good explanation. I too consider it unusual behaviour so thought it worthy of note.
Weirdan wrote: It isn't more correct in php world, but I consider it's good habit. Someday you could move to another language.......
Yes, perhaps my phrasing on that one was not the best.
Post Reply