Conceptually speaking I think I may have found a lightweight solution though I'd like some input before I get too deep in to this.
So the issue with XML that I'm trying to avoid is this...
[b][i]bad xml[/b][/i]
...this would generate the following XHTML code...
<b><i>bad xml</b></i>
...which would break the entire page.
So I was thinking how can I avoid using something like regular expressions?
Well there are plenty of BB code to HTML converters, the most efficient of which I came across at http://elouai.com/bb2html.php.txt however it wouldn't prevent bad XML.
I'm a visual person and after messing with implode and explode a bit today I thought, well I can explode [ right? What would I get then if I implode the following...
$pizza1 = 'stuff [b]1[/b] [b]2[/b] [i]3[/i] now for some bad bb to xml! [b][i]bad xml![/b][/i]';
$pieces1 = explode("[", $pizza1);
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div>';
I get the following output...
Array
(
[0] => stuff
[1] => b]1
[2] => /b]
[3] => b]2
[4] => /b]
[5] => i]3
[6] => /i] now for some bad bb to xml!
[7] => b]
[8] => i]bad xml!
[9] => /b]
[10] => /i]
)
While going through the array I could generate temporary mini-arrays potentially.
b...next item is /b? Yes? Good, convert to XML! No? Generate temporary array.
Moving down to the invalid XML...
b...next item is /b? No, i. Next item is /i? No it's /b.../i has not yet been detected.
That last bit I think at least at this point may be tricky if I decide to go with this approach. The major challenge I think would be making sure deeply nested BB code worked as desired...and course stress testing it.
So that is what I'm currently thinking in regards to converting BB code to XML. I'm very open to suggestions for other ways to approach this interesting challenge!