Page 1 of 1

Matching URLs *not* inside [code] tags

Posted: Sun Nov 15, 2009 3:15 pm
by Goldeneye
In my script, URLs are automatically parsed into Anchored HTML links. I also have a [ code ] (I call it [raw] -- as in raw-text) that obviously used for when you don't want text to formatted. The problem is that URLs inside [raw] tags get automatically hyper-linked.

So, how would I match URLs not inside [ code ] tags?

I tried this (and other several modifications):

Code: Select all

'/[^(\&\#91;raw\])] ((mailto:|(http|ftp|nntp|news):\/\/).*?)(\s|<|\)|"|\\\\|\'|$) [^(\&\#91;\/raw\])]/si'
But it doesn't work.. at all.

I use [^(\&\#91;raw\])] instead of [^(\[raw\])] because I replace the opening-square bracket with it's corresponding entity to prevent it from being matched later on by my other Regex.

A preemptive thanks for your time.

Re: Matching URLs *not* inside [code] tags

Posted: Mon Nov 16, 2009 9:05 am
by ridgerunner
This would be extremely difficult (if not downright impossible) to reliably achieve with regex alone. I would recommend first splitting out the [raw] sections, then apply the linkification regex to the stuff outside the [raw], then piece it all back together. This is what the parse_message() function in the parser.php code does for the punBB 1.2 forum software. You can download punbb-1.2.22.zip here and look at how they do it.

Good luck!

Re: Matching URLs *not* inside [code] tags

Posted: Mon Nov 16, 2009 2:16 pm
by Goldeneye
Really? It seems like it'd be possible to do with Regular Expressions. I'll check out how PunBB does it, though. Thanks a lot! This should prove to be easier than figuring out a Regular Expression for it. Thank you, ridgerunner!