Nested BBCode Tags

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
Syntac
Forum Contributor
Posts: 327
Joined: Sun Sep 14, 2008 7:59 pm

Nested BBCode Tags

Post by Syntac »

I'm trying to write a BBCode parser. I've had excellent success so far, except for one sticking point: Tags don't nest properly. Does anyone know why this is and how I can fix it?

I'm using preg_replace, with the modifiers "i" and "s".
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Nested BBCode Tags

Post by VladSun »

Syntac wrote:I'm trying to write a BBCode parser.
Why?

http://bg2.php.net/bbcode
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Nested BBCode Tags

Post by prometheuzz »

Regex is not well suited to built entire parsers.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Re: Nested BBCode Tags

Post by GeertDD »

There is an interesting PCRE feature which you don't often hear talk about: recursive patterns. Well, to be honest, I have never really used it myself neither. However, you can do some cool stuff with it if you can wrap your head around it.

See http://www.pcre.org/pcre.txt and scroll down to the "recursive patterns" heading.

You can have a look at it, but I guess that it won't build an entire parser for you. While regular expressions are an awesome help in creating a parser, you will need more than just that. I agree with prometheuzz.
User avatar
Syntac
Forum Contributor
Posts: 327
Joined: Sun Sep 14, 2008 7:59 pm

Re: Nested BBCode Tags

Post by Syntac »

Thank you all for your help. However, I recently decided it would be easier to write a parser from scratch than figure out the complexities of PCRE. Prometheuzz made a good point: Regular expressions aren't suited for this sort of thing.
ASDen
Forum Commoner
Posts: 55
Joined: Fri Aug 24, 2007 10:27 am

Re: Nested BBCode Tags

Post by ASDen »

Well, Regexp's really won't build a full parser for you
BUT for the specific issue of parsing Nested BBCodes , you can use this

Code: Select all

 
  /**
     * A Template for the recursive tags matcher RE
     * it generates it for a given tag ,open bracket and closing one
     * $O & $C must be pre-escaped from #'s
     * @param String $tag Tag to be parsed recursively
     * @param String $O   Opeening brackets of tag
     * @param String $C   Closing brackets of tag
     */
    public function Recursive_RE_Generator($tag,$O,$C)
    {
       $re="#{$O}({$tag}.*?){$C}((?>{$O}(?!/?{$tag}[^{$O}]*?{$C})|[^{$O}]|(?R))*){$O}/{$tag}{$C}#is";
        return $re;
    }
 
for parsing nested bbcodes use it with preg_replace or preg_replace_callback
you can also extecat bbcode arguments
User avatar
Syntac
Forum Contributor
Posts: 327
Joined: Sun Sep 14, 2008 7:59 pm

Re: Nested BBCode Tags

Post by Syntac »

Awesome man, thanks. However, I decided to build a parser with no regex whatsoever. It seems to be working fine.
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Re: Nested BBCode Tags

Post by Jenk »

Tokenising is the only way to go when parsing anything that has nesting.
Post Reply