BB code to XML as application/xhtml+xml
Posted: Mon Aug 17, 2009 9:24 pm
I've been thinking for a few days about the best way to approach handling converting BB code in to XHTML in a way that is sound enough to serve as application/xhtml+xml though would not generate a lot of server load. Nay-sayers of application/xhtml+xml need not apply, it's going to happen one way or another and it's my unchanging view that broken code should break otherwise I won't easily know it should be fixed!
Conceptually speaking I think I may have found a lightweight solution though I'd like some input before I get too deep in to this.
So the issue with XML that I'm trying to avoid is this...
[b][i]bad xml[/b][/i]
...this would generate the following XHTML code...
<b><i>bad xml</b></i>
...which would break the entire page.
So I was thinking how can I avoid using something like regular expressions?
Well there are plenty of BB code to HTML converters, the most efficient of which I came across at http://elouai.com/bb2html.php.txt however it wouldn't prevent bad XML.
I'm a visual person and after messing with implode and explode a bit today I thought, well I can explode [ right? What would I get then if I implode the following...
$pizza1 = 'stuff [b]1[/b] [b]2[/b] [i]3[/i] now for some bad bb to xml! [b][i]bad xml![/b][/i]';
$pieces1 = explode("[", $pizza1);
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div>';
I get the following output...
Array
(
[0] => stuff
[1] => b]1
[2] => /b]
[3] => b]2
[4] => /b]
[5] => i]3
[6] => /i] now for some bad bb to xml!
[7] => b]
[8] => i]bad xml!
[9] => /b]
[10] => /i]
)
While going through the array I could generate temporary mini-arrays potentially.
b...next item is /b? Yes? Good, convert to XML! No? Generate temporary array.
Moving down to the invalid XML...
b...next item is /b? No, i. Next item is /i? No it's /b.../i has not yet been detected.
That last bit I think at least at this point may be tricky if I decide to go with this approach. The major challenge I think would be making sure deeply nested BB code worked as desired...and course stress testing it.
Perhaps just checking against the last array item...it may not be all that difficult...
So that is what I'm currently thinking in regards to converting BB code to XML. I'm very open to suggestions for other ways to approach this interesting challenge!
Conceptually speaking I think I may have found a lightweight solution though I'd like some input before I get too deep in to this.
So the issue with XML that I'm trying to avoid is this...
[b][i]bad xml[/b][/i]
...this would generate the following XHTML code...
<b><i>bad xml</b></i>
...which would break the entire page.
So I was thinking how can I avoid using something like regular expressions?
Well there are plenty of BB code to HTML converters, the most efficient of which I came across at http://elouai.com/bb2html.php.txt however it wouldn't prevent bad XML.
I'm a visual person and after messing with implode and explode a bit today I thought, well I can explode [ right? What would I get then if I implode the following...
$pizza1 = 'stuff [b]1[/b] [b]2[/b] [i]3[/i] now for some bad bb to xml! [b][i]bad xml![/b][/i]';
$pieces1 = explode("[", $pizza1);
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div>';
I get the following output...
Array
(
[0] => stuff
[1] => b]1
[2] => /b]
[3] => b]2
[4] => /b]
[5] => i]3
[6] => /i] now for some bad bb to xml!
[7] => b]
[8] => i]bad xml!
[9] => /b]
[10] => /i]
)
While going through the array I could generate temporary mini-arrays potentially.
b...next item is /b? Yes? Good, convert to XML! No? Generate temporary array.
Moving down to the invalid XML...
b...next item is /b? No, i. Next item is /i? No it's /b.../i has not yet been detected.
That last bit I think at least at this point may be tricky if I decide to go with this approach. The major challenge I think would be making sure deeply nested BB code worked as desired...and course stress testing it.
So that is what I'm currently thinking in regards to converting BB code to XML. I'm very open to suggestions for other ways to approach this interesting challenge!