BB code to XML as application/xhtml+xml

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

BB code to XML as application/xhtml+xml

Post by JAB Creations »

I've been thinking for a few days about the best way to approach handling converting BB code in to XHTML in a way that is sound enough to serve as application/xhtml+xml though would not generate a lot of server load. Nay-sayers of application/xhtml+xml need not apply, it's going to happen one way or another and it's my unchanging view that broken code should break otherwise I won't easily know it should be fixed! :wink:

Conceptually speaking I think I may have found a lightweight solution though I'd like some input before I get too deep in to this.

So the issue with XML that I'm trying to avoid is this...

[b][i]bad xml[/b][/i]

...this would generate the following XHTML code...

<b><i>bad xml</b></i>

...which would break the entire page.

So I was thinking how can I avoid using something like regular expressions?

Well there are plenty of BB code to HTML converters, the most efficient of which I came across at http://elouai.com/bb2html.php.txt however it wouldn't prevent bad XML.

I'm a visual person and after messing with implode and explode a bit today I thought, well I can explode [ right? What would I get then if I implode the following...

$pizza1 = 'stuff [b]1[/b] [b]2[/b] [i]3[/i] now for some bad bb to xml! [b][i]bad xml![/b][/i]';
$pieces1 = explode("[", $pizza1);
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div>';

I get the following output...

Array
(
[0] => stuff
[1] => b]1
[2] => /b]
[3] => b]2
[4] => /b]
[5] => i]3
[6] => /i] now for some bad bb to xml!
[7] => b]
[8] => i]bad xml!
[9] => /b]
[10] => /i]
)

While going through the array I could generate temporary mini-arrays potentially.

b...next item is /b? Yes? Good, convert to XML! No? Generate temporary array.

Moving down to the invalid XML...
b...next item is /b? No, i. Next item is /i? No it's /b.../i has not yet been detected.

That last bit I think at least at this point may be tricky if I decide to go with this approach. The major challenge I think would be making sure deeply nested BB code worked as desired...and course stress testing it. 8O Perhaps just checking against the last array item...it may not be all that difficult...

So that is what I'm currently thinking in regards to converting BB code to XML. I'm very open to suggestions for other ways to approach this interesting challenge!
User avatar
jackpf
DevNet Resident
Posts: 2119
Joined: Sun Feb 15, 2009 7:22 pm
Location: Ipswich, UK

Re: BB code to XML as application/xhtml+xml

Post by jackpf »

Regex isn't entirely "my thing"...but I think it'd be far better than what you're suggesting.

You could use a negative lookahead.
Eg:

Code: Select all

/\[(.*)\](.*?)(?!.*\[.*\])\[\/$1\]/
This is completely untested, and...as I say, regex isn't my strong point. But what it should do, is match a tag in square brackets, and as long as there isn't a nested tag within it, parse it.

I think that's where you should go with this...
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Re: BB code to XML as application/xhtml+xml

Post by JAB Creations »

First Jack...thank you for creating that insane looking regular expression. Unfortunately when I tested it against a couple of strings (both valid and invalid) it got stuck in an infinite loop. 8O I tested it with The regex Coach which I highly recommend downloading. I was able to reduce an email regex by half the number of steps using that program! Just add a regular expression, a string to validate against, click on the 'Step' tab at the bottom, click 'Start', and then keep clicking 'Next step' until it either validates, fails hard, or seems to be looping infinitely. I'm not all that good with regex either but this tool is just invaluable! :)

Thankfully a lot of the conceptual work formulated in to actual code...now keep in mind this is only my first working build! I haven't added support for bb code tags like img or url though I do think this is much more lightweight then using regular expressions. There are a couple of pizzas to test (and naturally you can bake your own :mrgreen:) to test both a submitted post with valid bb code or not. Any suggestions for improvement are welcomed of course!

I have the overall bb code to XHTML (as application/xhtml+xml) plan worked out as so...

1.) First validation, no point in converting invalid bb code to invalid XHTML!

2.) Parsing the bb code in to XHTML; I will like use the solution found here: http://elouai.com/bb2html.php.txt

3.) Clean-up; censoring curse words and paragraph formatting as was discussed here and here and again a big thanks to those who helped out there!

I will post the pizza after this post since using BB code would make reading this post less painful. :mrgreen:

Code: Select all

<?php
$pieces1 = explode("[", $pizza1);
 
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div>';
 
$bb_open = array();
//$bb_close = array();
 
foreach($pieces1 as $key => $value)
{
 $pieces1 = explode("]", $value);
 if (count($pieces1)=='2')
 {
  $pieces2 = explode("/", $value);
 
   //IF--> Starting BB code tag? Add it to array to compare against!
   //ELSE--> Ending BB code tag? If it matches last array value remove last array value; otherwise this BB code will generate invalid XML markup!
   if (count($pieces2)=='1')
   {
    $pieces3 = explode("]", $value);
    //echo $pieces3[0].'<br />';
    array_push($bb_open,$pieces3[0]);
   }
   else if (count($pieces2)=='2')
   {
    //array_push($bb_close,$value);
 
    if ($pieces3[0]==end($bb_open)) {echo $pieces3[0].' == '.end($bb_open).' == Valid close<br />'; array_pop($bb_open);}
    else {echo $pieces3[0].' !! '.end($bb_open).' !! NOT Valid close<br />';}
    
   }
   //echo count($pieces2).' = '.$value.'<br />';
 }
}
?>
I can't believe I'm actually getting this done! 8O
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

PIZZA!

Post by JAB Creations »

Here are your two pizzas, enjoy! :mrgreen:

$pizza1 = 'stuff [b]1[/b] [b]2[/b] [i]3[/i] now for some bad bb to xml! [b][i]bad xml![/b][/i]';
//$pizza1 = 'stuff [b]1[/b] [i]o[/i]';
User avatar
jackpf
DevNet Resident
Posts: 2119
Joined: Sun Feb 15, 2009 7:22 pm
Location: Ipswich, UK

Re: BB code to XML as application/xhtml+xml

Post by jackpf »

[b]a[/b][b][i][/i][/b]
That failed. But it looks valid to me...
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Re: BB code to XML as application/xhtml+xml

Post by JAB Creations »

Doh! I forgot to add support for nested bb code. :oops: I was thinking about that a lot earlier today too! Ok, working on it now...
User avatar
jackpf
DevNet Resident
Posts: 2119
Joined: Sun Feb 15, 2009 7:22 pm
Location: Ipswich, UK

Re: BB code to XML as application/xhtml+xml

Post by jackpf »

I honestly think regex would be the better option...even though it's harder.

You should speak to this guy: viewtopic.php?f=38&t=104353#p558456

He's an absolute genius with regex.
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Re: BB code to XML as application/xhtml+xml

Post by JAB Creations »

jackpf wrote:I honestly think regex would be the better option...even though it's harder.

You should speak to this guy: viewtopic.php?f=38&t=104353#p558456

He's an absolute genius with regex.
...and then pytrin pawns for the lulz. :twisted:

*edit* Sorry...but seriously...regex can't be the answer like he said all the time.
Last edited by JAB Creations on Tue Aug 18, 2009 4:45 pm, edited 1 time in total.
User avatar
jackpf
DevNet Resident
Posts: 2119
Joined: Sun Feb 15, 2009 7:22 pm
Location: Ipswich, UK

Re: BB code to XML as application/xhtml+xml

Post by jackpf »

Yeah lol


I did find that quote quite funny though.
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Re: BB code to XML as application/xhtml+xml

Post by JAB Creations »

I spent some time revising the script and cleaned up the logic. Really...I thought this was going to implode my head even when I decided to take a crack at it.

So first the code and then in a quick follow-up post some starting pizza. Again I haven't implemented support for verifying img or url bb code attributes. Which makes me wonder...would it be faster to do a second validation for attribute based bb code or implement it in to the first validation function (when I turn this in to a function of course)?

Any way here is what I have, will it blend? :mrgreen: Comments and suggestions for improvement are welcomed! :)

Code: Select all

<?php
// *** Please see following post for sample BB code to validate.
$pieces1 = explode("[", $pizza1);
 
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div>';
 
$bb_open = array();
 
$i = '0';
foreach($pieces1 as $key => $value)
{
 echo '<div>'.$i.'</div>';
 $pieces1 = explode("]", $value);
 echo '<div>pieces1 == '.$pieces1[0].'</div>';
 
 if ($pieces1[0][0]!='/')
 {
  // Opening BB Code
  // Add BB code to end of array to be compared with next closing BB code.
  array_push($bb_open,$pieces1[0]);
  //echo '<div>';
  //print_r($bb_open);
  //echo '</div>';
 }
 else if ($pieces1[0][0]=='/')
 {
  $pieces2 = explode("/", $pieces1[0]);
  echo '<div>pieces2 == '.$pieces2[1].'</div>';
 
  // Closing BB Code
  // Compare to last element in $bb_code opened tag, if not a match then code is invalid!
  if ($pieces2[1]==end($bb_open)) {echo '<div>'.$pieces2[1].' == '.end($bb_open).' == VALID!</div>';}
  else {echo '<div>'.$pieces2[1].' == '.end($bb_open).' == INVALID!</div>';}
  array_pop($bb_open);
 }
 echo '<br />';
 $i++;
}
?>
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Re: BB code to XML as application/xhtml+xml

Post by JAB Creations »

Get your fresh pizza!

$pizza1 = '[b]a[/b][b][i][u][/u][img][a][/a][/img][/i][/b]';

The next thing I'm going to work on is turning this in to a function and then returning invalid on the first validation failure or returning valid if the string's bb code is found valid.
User avatar
jackpf
DevNet Resident
Posts: 2119
Joined: Sun Feb 15, 2009 7:22 pm
Location: Ipswich, UK

Re: BB code to XML as application/xhtml+xml

Post by jackpf »

[i][i][i][/i][/i][i]
Says valid when it isn't :P


*cough* regex *cough*
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Re: BB code to XML as application/xhtml+xml

Post by JAB Creations »

Thanks Jack, you're doing an excellent job of emulating people with no sense of validity! :lol:

You actually have half a point for that...notice with my last pizza that the last bb code is either labeled as valid or invalid while with your pizza it isn't labeled at all. :P Also keep in mind up until this point this hasn't been adapted in to a function that ultimately says all the bb code is valid or rejects it as invalid. However you do get credit for pointing out how I need to program the last bb code bit so yeah it's still an important point you made, thanks! :mrgreen:

I actually have been using the $i variable just to help me visualize what is happening...seems like I'll have to ensure the last element in the array is a closing element. I've implemented the fix on line 24 making sure if the count of the array elements equals $i when the bb code is an open tag instead of close then it fails validation.

Ok here is the updated version. I got distracted by the material world so I'm only now going to implement this as a function after posting this. Feel free to bring more pizza to the party. :lol:

Code: Select all

<?php
// Look for pizza here on the forum thread.
$pieces1 = explode("[", $pizza1);
 
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div>';
 
 
$c = count($pieces1);
$i = '1';
$bb_open = array();
 
foreach($pieces1 as $key => $value)
{
 echo '<div>i = '.$i.' and c = '.$c.'</div>';
 $pieces1 = explode("]", $value);
 echo '<div>pieces1 == '.$pieces1[0].'</div>';
 
 if ($pieces1[0][0]!='/')
 {
  // Opening BB Code
  // Add BB code to end of array to be compared with next closing BB code.
  if ($i!=$c)
  {
   array_push($bb_open,$pieces1[0]);
   //echo '<div>';
   //print_r($bb_open);
   //echo '</div>';
  }
  else {echo '<div>INVALID! Last BB is opening when should close!</div>';}
 }
 else if ($pieces1[0][0]=='/')
 {
  $pieces2 = explode("/", $pieces1[0]);
  echo '<div>pieces2 == '.$pieces2[1].'</div>';
 
  // Closing BB Code
  // Compare to last element in $bb_code opened tag, if not a match then code is invalid!
  if ($pieces2[1]==end($bb_open)) {echo '<div>'.$pieces2[1].' == '.end($bb_open).' == VALID!</div>';}
  else {echo '<div>'.$pieces2[1].' == '.end($bb_open).' == INVALID!</div>';}
  array_pop($bb_open);
 }
 echo '<br />';
 $i++;
}
?>
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Re: BB code to XML as application/xhtml+xml

Post by JAB Creations »

Added support for img and url since they have an equal sign directly after (previous builds would mark them invalid though I wasn't yet testing that).

So here is the code following by another hurdle I'm preparing to deal with...

Code: Select all

$pieces1 = explode("[", $pizza1);
 
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div>';
 
 
$c = count($pieces1);
$i = '1';
$bb_open = array();
 
foreach($pieces1 as $key => $value)
{
 echo '<div>i = '.$i.' and c = '.$c.'</div>';
 $pieces1 = explode("]", $value);
 echo '<div>pieces1 == '.$pieces1[0].'</div>';
 
 $pieces2 = explode("=", $pieces1[0]);
 
 if ($pieces2[0][0]!='/')
 {
  // Opening BB Code
  // Add BB code to end of array to be compared with next closing BB code.
  if ($i!=$c)
  {
   array_push($bb_open,$pieces2[0]);
   //echo '<div>';
   //print_r($bb_open);
   //echo '</div>';
  }
  else {echo '<div>INVALID! Last BB is opening when should close!</div>';}
 }
 else if ($pieces2[0][0]=='/')
 {
  $pieces3 = explode("/", $pieces2[0]);
  echo '<div>pieces2 == '.$pieces3[1].'</div>';
 
  // Closing BB Code
  // Compare to last element in $bb_code opened tag, if not a match then code is invalid!
  if ($pieces3[1]==end($bb_open)) {echo '<div>'.$pieces3[1].' == '.end($bb_open).' == VALID!</div>';}
  else {echo '<div>'.$pieces3[1].' == '.end($bb_open).' == INVALID!</div>';}
  array_pop($bb_open);
 }
 echo '<br />';
 $i++;
}
So another hurdle that I thought up was what happens if you have valid opening and closing of bbcode though with a break line between them? Well I'm not going to use break line (<br />) elements, with help in a couple other threads I'm aiming for paragraphs...so the current issue I'm eyeing is preventing the following...

Code: Select all

<p>stuff <b>stuff</p> <p>stuff </b></p>
I was thinking...oh no this is going to become a real mess if I have to somehow reverse explode stuff...but then I thought...wait a second! This is going in to a function...so why not just explode the original string by break lines and then perform this function on each chunk of the main array? :mrgreen: So that's what I'm working on now!
User avatar
jackpf
DevNet Resident
Posts: 2119
Joined: Sun Feb 15, 2009 7:22 pm
Location: Ipswich, UK

Re: BB code to XML as application/xhtml+xml

Post by jackpf »

I don't think you'll ever get it 100% with what you're doing...there'll always be an exception.

Here, I had another go at regex. (not posting in code tags because it'll parse the bbcode)

<?php
function foo($code)
{
return preg_replace('/\[(.*)\](.*?)(?!.*?\[^\1\])\[\/\1\]/', '<$1>$2</$1>', $code);
}

echo foo('[b]bold[/b]').'<br />';

echo foo('[i]italic[/i]').'<br />';

echo foo('[b][i]skips italic, parses bold[/b][/i]').'<br />';

Atm it doesn't parse nested tags at all...but I reckon that's where you should go with this. Looking at all your explodes and loops and stuff makes me shudder... :P

Maybe it would be better to do a preg_replace_callback(), and recurs through the tags...although it may not be the most efficient method.
Post Reply