Any one who knows me knows that I swear by XHTML and naturally serve it as XHTML which means application/xhtml+xml! I've been working on my own blog, chat room, forums, etc and I want them to all support BBCode though I also did not want to have people sending broken BBCode that would simply convert in to broken XHTML, that would after all make the entire page break. Some people would simply say I should stick to regular HTML though in example a couple of months ago someone had only one browser that was breaking and he didn't realize he blew his whole day tracking down the problem because he was missing a quote. You don't need to validate much with real XHTML and if you do it right it you can save a lot more time.
I've spent the past two or three weeks working on this more or less and one of my goals was to absolutely avoid using regular expressions. I'm not sure how fast my validator/parser is compared to others though I'm always open to suggestions for improving it! Another goal was to absolutely make sure no invalid XHTML would be generated. Everyone has different needs and one thing that should be taken in to consideration is that this will be implemented with a JavaScript equivalent that is already half-way finished. I haven't made any big effort to deploy advanced error messages though if anyone wants to take this and clean up the error messages feel free to do so!
---Functions---
As far as BBCode is concerned itself there are a few interesting notes I'll share. First and foremost clean XHTML output was my highest priority...line breaks, whitespace, and all! The initial function explodes line breaks, discards excessive line breaks, and then sends it off to a line of several functions to be validated and possibly parsed. There is currently a small problem with the paragraph counter if an error occurs, I will try to resolve this tonight or tomorrow sometime.
The second function does a quick nest validation of BBCode quotes which if parsed will be converted in to XHTML blockquote elements. These are the only block-level elements I've allowed and thus they are the only elements that need to validated before the initial string ($_POST variable from a textarea in example) is exploded and the inline elements are validated. The blockquote element is the only element which I allow to nest itself, all the inline elements you'll later find will trigger a nest validation error if you try to do something like [b][b]double bold for that extra heavy flavor![/b][/b]
The third function validates inline elements as I've mentioned above. Opened BB tags are added to an array called $bb_open and as they are removed so to are the array elements that represent them. The $bb_allowed array lists the BBCode tags that will be validated and possibly parsed...all other "tags" are simply ignored.
The fourth function parses only *BASIC* inline elements from BBCode to XHTML. I totally have to give credit to the person who showed how to effectively use str_replace at http://elouai.com/bbcode-sample.php.
The fifth function is called from the fourth and it handles *ADVANCED* BBCode to XHTML parsing. Basically any BBCode that contains an something that will be parsed with an attribute/value being created; basically color, size, and url tags. Double nesting might be argued for one way or another though I simply don't see the point...it would have made this code horridly tangled (it wasn't exactly a piece of cake to do in the first place!) and it's completely unnecessary. I've done a lot of testing nesting this and that, and then that and this, and then both situations in a quote...that was in another quote...and everything added altogether...mixed it up, threw in some eyeballs...etc. This was the part that I encountered most of the invalid (or even valid) BBCode being parsed as invalid XHTML. I also strip both double and single quotes from color and size without spawning an error message. Detecting a pound sign I'm not entirely sure about...I wanted to make sure you could just plop in "red" instead of #f00...being at least somewhat reasonably flexible for the variation of human input was one of my goals.
The sixth function is called from the fourth function and what it does is take all the parsed code and stick it in to paragraphs. I absolutely have to give credit to pytrin in the following thread for helping me figure this part out as well as McInfo who pointed out using the ampersand in a foreach which made things much easier to deal with. --> viewtopic.php?f=1&t=104857 <-- Instead of replacing line breaks with...break line elements I figured why not use paragraphs instead? It works better semantically, contextually, and it avoids break lines which if for some crazy reason I converted from XHTML to HTML would spawn a lot of validation errors (there are like only a dozen break line elements in the 29th version of my site when I checked about a month ago and this is completely unintentional). If a chunk of text doesn't begin with a blockquote then I amend the text in to a paragraph element.
...can't forget to give credit to Jack, John, Ollie, and pytrin for their input in this thread too!
viewtopic.php?f=1&t=104849
I've setup a live test environment that everyone is welcomed to come test stuff out. I *WANT* people to try and break the page as *ANY* XML nesting error will break the entire page though naturally I also don't want any test cases to break. I can't generate any more errors on my own so I feel now is a good time to test this in a test environment.
http://www.jabcreations.com/bbcode/
As for the code itself...here it is! I will reply with an attachment with the whitespace in still intact (and once this thread's URL is generated). All constructive criticisms are welcomed! If I forgot to credit anyone for anything please tell me! If you break the XHTML output please tell me about that too!
________________________________________________
/*
Purpose: Explodes the entire input string by line breaks before passing to validation functions.
Details: Starting/ending function; first half validates, second half parses BBcode to XHTML.
*/
function bb_1($pizza)
{
$result = bb_2_validate_block($pizza);
if ($result=='valid')
{
$pieces1 = explode("\n", $pizza);
$i = '0';
foreach($pieces1 as $key => $value)
{
if (!empty($value))
{
$result = bb_3_validate_inline($value);
if ($result!='valid') {return 'In paragraph '.$i.' '.$result; break;}
$i++;
}
}
unset($value);
if (isset($result))
{
if ($result=='valid') {$result = bb_4_xhtml($pizza);}
}
else {$result = 'invalid?';}
}
return $result;
}
/*
Purpose: Quote to blockquote validation.
Details: You can quote someone who quoted someone else.
*/
function bb_2_validate_block($pizza)
{
$pieces1 = explode("[", $pizza);
$open = '0';
foreach($pieces1 as $key => $value)
{
$pieces2 = explode("]", $value);
if ($pieces2[0]=='quote') {$open++;}
else if ($pieces2[0]=='/quote') {$open--;}
if ($open<0) {$result = 'invalid quote nesting';}
}
unset($value);
if ($open!='0') {$result = 'invalid quote nesting';}
else {$result = 'valid';}
return $result;
}
/*
Purpose: validates inline element nesting.
Details: valid: [b][i][/i][/b]; invalid: [b][i][/b][/i].
*/
function bb_3_validate_inline($pizza)
{
$pieces1 = explode("[", $pizza);
$c = count($pieces1);
$i = '1';
$bb_open = array();
$bb_allowed = array('b','code','color','i','img','q','size','u','url');
//'quote',
// Do a seperate iteration for block-level quotes?
foreach($pieces1 as $key => &$value)
{
$pieces2 = explode("]", $value);
$pieces3 = explode("=", $pieces2[0]);
if (in_array($pieces3[0], $bb_allowed))
{
if (in_array($pieces3[0], $bb_open)) {$result = 'double nesting of inline element'; break;}
//echo '<div>'.$pieces3[0].'</div>';
// Allow blockquote nesting though prevent inline elements from having blockquotes nested within them...
if ($pieces2[0]=='quote')
{
foreach($bb_open as $key1 => $value1)
{
if ($value1!='quote') {$result = 'quote block element nested inside of inline element!'; break;}
}
}
// img and url validation!!!!!!!!!!!
if (count($pieces3)=='2')
{
if (substr($pieces3[1], 0,7)!='http://')
{
if ($pieces3[0]=='img') {$result = 'invalid img url'; break;}
else if ($pieces3[0]=='url') {$result = 'invalid url'; break;}
}
}
if (!empty($pieces3[0]) && count($pieces2)=='2')
{
//echo '<div>3[0] === '.$pieces3[0].'</div>'."\n";
if ($pieces3[0][0]!='/')
{
// Opening BB Code
// Add BB code to end of array to be compared with next closing BB code.
if ($i!=$c)
{
if (in_array($pieces3[0],$bb_allowed))
{
array_push($bb_open,$pieces3[0]);
}
else {$result = 'INVALID! element not in white list! == '.$pieces3[0]; break;}
}
else {$result = 'INVALID! <b>'.$pieces3[0].'</b> Last BB is opening when should close!'; break;}
}
}
}
else if (isset($pieces3[0][0]))
{
if ($pieces3[0][0]=='/')
{
$pieces4 = explode("/", $pieces3[0]);
// Closing BB Code
// Compare to last element in $bb_code opened tag, if not a match then code is invalid!
//echo '<div>3[0]] === '.$pieces3[0].'</div>'."\n";
if (in_array($pieces4[1],$bb_allowed)) {if ($pieces4[1]!=end($bb_open)) {$result = $pieces4[1].' == '.end($bb_open).' == INVALID!'; break;}}
array_pop($bb_open);
}
}
$i++;
}
unset($value);
if (!isset($result)) {$result = 'valid';}
return $result;
}
/*
Purpose: BBcode parser for inline XHTML elements.
Details: Since this function recieves elements exploded from line breaks this does not parse block-level elements.
*/
function bb_4_xhtml($text0)
{
// CREDIT TO...
// http://elouai.com/bbcode-sample.php
$bb1 = array('<', '>');
$xml1 = array('<', '>');
$bb2 = array(
'[b]','[/b]',
'[syntax=php]','[/syntax]',
'[i]','[/i]',
'[q]','[/q]',
'[u]','[/u]',
'[quote]', '[/quote]',
);
$xml2 = array(
'<b>','</b>',//'<span class="b">','</span>',
'<code><pre>','</pre></code>',
'<i>','</i>',//'<span class="i">','</span>',
'<q>','</q>',
'<u>','</u>',//'<span class="u">','</span>',
'<blockquote>','</blockquote>',
);
$text1 = str_replace($bb1, $xml1, $text0);
$text2 = str_replace($bb2, $xml2, $text1);
$text3 = bb_5_xhtml_advanced($text2);
$result = bb_6_n2p($text3);
return $result;
}
/*
Purpose:
Details:
*/
function bb_5_xhtml_advanced($pizza)
{
/////////////////////////////////
// URL!
/////////////////////////////////
$p1 = explode("[url",$pizza);
foreach($p1 as $key => &$value)
{
$p2 = explode("[/url]",$value);
$p3 = explode("]",$p2[0]);
$p4 = explode("=",$p3[0]);
//echo '<div>'.$value.'</div>';
if (count($p2)=='2') {$value = '<a class="icon external" href="'.$p4[1].'" rel="nofollow" tabindex="3">'.$p3[1].'</a>'.$p2[1];}
}
unset($value);
$pizza = implode($p1, '');
/////////////////////////////////
// COLOR!
/////////////////////////////////
$p1 = explode("[color=",$pizza);
foreach($p1 as $key => &$value)
{
$p2 = explode("[/color]",$value);//detect if [/color] exists via count! (1|2)
if (count($p2)=='2')
{
$p3 = explode("]",$value);
$p4 = str_replace("'","",$p3[0]);
$p4 = str_replace('"',"",$p4);
$p4 = str_replace("\\","",$p4);
$p5 = explode("[/color]",$value);
$p6 = explode($p3[0].']',$p5[0]);
$value = '<span style="color: '.$p4.';">'.$p6[1].'</span>'.$p2[1];
}
}
$pizza = implode($p1, '');
unset($p1);
unset($p2);
unset($p3);
unset($value);
/////////////////////////////////
// SIZE!
/////////////////////////////////
$p1 = explode("[size=",$pizza);
foreach($p1 as $key => &$value)
{
$p2 = explode("[/size]",$value);//detect if [/size] exists via count! (1|2)
if (count($p2)=='2')
{
$p3 = explode("]",$value);
$p4 = str_replace("'","",$p3[0]);
$p4 = str_replace('"',"",$p4);
$p4 = str_replace("\\","",$p4);
$p5 = explode("[/size]",$value);
$p6 = explode($p3[0].']',$p5[0]);
$value = '<span style="font-size: '.$p4.'px;">'.$p6[1].'</span>'.$p2[1];
}
}
$pizza = implode($p1, '');
unset($p1);
unset($p2);
unset($p3);
unset($value);
return $pizza;
}
/*
Purpose:
Details:
*/
function bb_6_n2p($pizza)
{
$pieces = explode("\n",$pizza);
foreach($pieces as $key => &$value)
{
$value=str_replace(" ", " ", $value);
$value=str_replace("\n", "", $value);
$value=str_replace("\r", "", $value);
if (strlen($value)<2) {unset($pieces[$key]);}
}
unset($value);
foreach($pieces as $key => &$value)
{
$bb_q = explode("<blockquote>",$value);
$bb_qe = explode("</blockquote>",$value);
$bb_qb = explode("<blockquote>",$bb_qe[0]);
if (count($bb_q)=='1' && count($bb_qe)=='1') {$value = "<p>".$bb_q[0]."</p>\n";}
else if (count($bb_q)=='2' && count($bb_qe)=='1') {$value = "<blockquote>\n<p>".$bb_q[1]."</p>\n";}
else if (count($bb_q)=='1' && count($bb_qe)=='2') {$value = "<p>".$bb_qe[0]."</p>\n</blockquote>\n\n";}
else if (count($bb_q)=='2' && count($bb_qe)=='2') {$value = "<blockquote>\n<p>".$bb_qb[1]."</p>\n</blockquote>\n\n";}
}
unset($value);
$result = implode($pieces, '');
return $result;
}
________________________________________________