Page 1 of 1
Nested BBCode Tags
Posted: Sun Sep 14, 2008 8:07 pm
by Syntac
I'm trying to write a BBCode parser. I've had excellent success so far, except for one sticking point: Tags don't nest properly. Does anyone know why this is and how I can fix it?
I'm using preg_replace, with the modifiers "i" and "s".
Re: Nested BBCode Tags
Posted: Sun Sep 14, 2008 9:06 pm
by VladSun
Syntac wrote:I'm trying to write a BBCode parser.
Why?
http://bg2.php.net/bbcode
Re: Nested BBCode Tags
Posted: Mon Sep 15, 2008 1:00 am
by prometheuzz
Regex is not well suited to built entire parsers.
Re: Nested BBCode Tags
Posted: Mon Sep 15, 2008 7:43 am
by GeertDD
There is an interesting PCRE feature which you don't often hear talk about: recursive patterns. Well, to be honest, I have never really used it myself neither. However, you can do some cool stuff with it if you can wrap your head around it.
See
http://www.pcre.org/pcre.txt and scroll down to the "recursive patterns" heading.
You can have a look at it, but I guess that it won't build an entire parser for you. While regular expressions are an awesome help in creating a parser, you will need more than just that. I agree with prometheuzz.
Re: Nested BBCode Tags
Posted: Mon Sep 15, 2008 2:36 pm
by Syntac
Thank you all for your help. However, I recently decided it would be easier to write a parser from scratch than figure out the complexities of PCRE. Prometheuzz made a good point: Regular expressions aren't suited for this sort of thing.
Re: Nested BBCode Tags
Posted: Sun Sep 21, 2008 6:09 pm
by ASDen
Well, Regexp's really won't build a full parser for you
BUT for the specific issue of parsing Nested BBCodes , you can use this
Code: Select all
/**
* A Template for the recursive tags matcher RE
* it generates it for a given tag ,open bracket and closing one
* $O & $C must be pre-escaped from #'s
* @param String $tag Tag to be parsed recursively
* @param String $O Opeening brackets of tag
* @param String $C Closing brackets of tag
*/
public function Recursive_RE_Generator($tag,$O,$C)
{
$re="#{$O}({$tag}.*?){$C}((?>{$O}(?!/?{$tag}[^{$O}]*?{$C})|[^{$O}]|(?R))*){$O}/{$tag}{$C}#is";
return $re;
}
for parsing nested bbcodes use it with preg_replace or preg_replace_callback
you can also extecat bbcode arguments
Re: Nested BBCode Tags
Posted: Mon Oct 20, 2008 4:57 pm
by Syntac
Awesome man, thanks. However, I decided to build a parser with no regex whatsoever. It seems to be working fine.
Re: Nested BBCode Tags
Posted: Mon Oct 27, 2008 12:39 am
by Jenk
Tokenising is the only way to go when parsing anything that has nesting.