Page 1 of 1

Nested BBCode Tags

Posted: Sun Sep 14, 2008 8:07 pm
by Syntac
I'm trying to write a BBCode parser. I've had excellent success so far, except for one sticking point: Tags don't nest properly. Does anyone know why this is and how I can fix it?

I'm using preg_replace, with the modifiers "i" and "s".

Re: Nested BBCode Tags

Posted: Sun Sep 14, 2008 9:06 pm
by VladSun
Syntac wrote:I'm trying to write a BBCode parser.
Why?

http://bg2.php.net/bbcode

Re: Nested BBCode Tags

Posted: Mon Sep 15, 2008 1:00 am
by prometheuzz
Regex is not well suited to built entire parsers.

Re: Nested BBCode Tags

Posted: Mon Sep 15, 2008 7:43 am
by GeertDD
There is an interesting PCRE feature which you don't often hear talk about: recursive patterns. Well, to be honest, I have never really used it myself neither. However, you can do some cool stuff with it if you can wrap your head around it.

See http://www.pcre.org/pcre.txt and scroll down to the "recursive patterns" heading.

You can have a look at it, but I guess that it won't build an entire parser for you. While regular expressions are an awesome help in creating a parser, you will need more than just that. I agree with prometheuzz.

Re: Nested BBCode Tags

Posted: Mon Sep 15, 2008 2:36 pm
by Syntac
Thank you all for your help. However, I recently decided it would be easier to write a parser from scratch than figure out the complexities of PCRE. Prometheuzz made a good point: Regular expressions aren't suited for this sort of thing.

Re: Nested BBCode Tags

Posted: Sun Sep 21, 2008 6:09 pm
by ASDen
Well, Regexp's really won't build a full parser for you
BUT for the specific issue of parsing Nested BBCodes , you can use this

Code: Select all

 
  /**
     * A Template for the recursive tags matcher RE
     * it generates it for a given tag ,open bracket and closing one
     * $O & $C must be pre-escaped from #'s
     * @param String $tag Tag to be parsed recursively
     * @param String $O   Opeening brackets of tag
     * @param String $C   Closing brackets of tag
     */
    public function Recursive_RE_Generator($tag,$O,$C)
    {
       $re="#{$O}({$tag}.*?){$C}((?>{$O}(?!/?{$tag}[^{$O}]*?{$C})|[^{$O}]|(?R))*){$O}/{$tag}{$C}#is";
        return $re;
    }
 
for parsing nested bbcodes use it with preg_replace or preg_replace_callback
you can also extecat bbcode arguments

Re: Nested BBCode Tags

Posted: Mon Oct 20, 2008 4:57 pm
by Syntac
Awesome man, thanks. However, I decided to build a parser with no regex whatsoever. It seems to be working fine.

Re: Nested BBCode Tags

Posted: Mon Oct 27, 2008 12:39 am
by Jenk
Tokenising is the only way to go when parsing anything that has nesting.