Page 2 of 2

Re: BB code to XML as application/xhtml+xml

Posted: Tue Aug 18, 2009 10:11 pm
by JAB Creations
More expressions? Stop reading poetry and join the dark side, we have explosives! :twisted:

Kidding aside as far as I am concerned it's not a matter of if but when I can achieve a goal! :wink: ...though you do have a point, your regex is beautifully short in comparison to my loops and explosions (and it works!)...but I'm just having too much fun blowing things up. :mrgreen:

I appreciate that you're trying to do this with regex as an alternative...I want to see my experiment through to the end and see how it holds up against regex as far as much higher iterations of execution are concerned. Plus it never hurts to have two approaches to the same goal. It's very likely that both solutions are more effective depending on the context of how they are applied (say casual talk versus a coding forum in example...not sure, just making something up).

However there are some issues I do have with your regex...actually validity of valid code...sort of. I serve my site as XHTML 1.1 though with your regex I'm not sure how to restrict elements to a white list like how I have been thinking for the past few minutes of how I will implement it in my exploding loop function.

Here is an example of invalid XHTML 1.1...even though it wouldn't break XML it's still invalid...

Code: Select all

echo foo('[jab]bold[/jab]').'<br />';
I think I just thought of a valid application for your regex, pure XML. My example of invalid XHTML would require my solution (and a white list I have yet to implement at this point) however it would be excessive (load wise not sure?) as far as the sheer amount of code if you're working with XML which technically doesn't have any invalid element names as far as I can tell. Maybe you've created a pure XML validator! :drunk:

It's sort of nice...I'm not feeling overwhelmed by the challenges...they're coming at a desirable pace. :) Ok ok, nearing the end of my day, gotta get at least one more build done! :)

Re: BB code to XML as application/xhtml+xml

Posted: Tue Aug 18, 2009 11:34 pm
by JAB Creations
One last build for the night! :)

I've done two things and patched a small bug (that I think I left out from the last build I posted).

First I've worked this in to a function that returns (and simply commented out debugging echo's).

Secondly I've thrown this in to a multiple break line loop...so the idea is a post (like this) will naturally have multiple break lines...I explode \n and use a foreach loop to process each array element individually.

Now that I think of it there are a couple benefits one of which I just realized! First I don't have to try to add any sort of special adaptation to scan two chunks of text with a break line...that's just not viable...but XHTML and XML don't work that way either. Secondly instead of just throwing an outright valid or invalid flag for an entire post I can (at least with the current code) let the user know which chunk of text contains the invalid bb code which I think is a whole lot more friendly then, 'Hey, there's something wrong with your novel sized post!' :mrgreen:

So here is the latest build...

Code: Select all

$multiple_line = "stuff[b]stuff![/b]\n[i]stuff[/i][b]stuff![/b]\nstuff[b]stuff![/b]\n";
$pieces1 = explode("\n", $multiple_line);
 
 echo '<div><pre>';
 print_r($pieces1);
 echo '</pre></div><br /><br /><br />';
 
 
foreach($pieces1 as $key => $value)
{
 //echo '<div>'.$value.'</div>';
 echo '<div>'.bb_validator($value).'</div><br />';
}
 
 
function bb_validator($pizza1)
{
 $pieces1 = explode("[", $pizza1);
 
 //echo '<div><pre>';
 //print_r($pieces1);
 //echo '</pre></div>';
 
 $c = count($pieces1);
 $i = '1';
 $bb_open = array();
 $bb_allowed = array('b','color','i','img','link','q','quote','size','u','url');
 
 foreach($pieces1 as $key => $value)
 {
  //echo '<div>i = '.$i.' and c = '.$c.'</div>';
  $pieces2 = explode("]", $value);
  //echo '<div>pieces1 == '.$pieces2[0].'</div>';
  //echo count($pieces2);
 
  $pieces3 = explode("=", $pieces2[0]);
 
  if (!empty($pieces3[0]) && count($pieces2)=='2')
  {
   if ($pieces3[0][0]!='/')
   {
    // Opening BB Code
    // Add BB code to end of array to be compared with next closing BB code.
    if ($i!=$c)
    {
     if (in_array($pieces3[0],$bb_allowed))
     {
      array_push($bb_open,$pieces3[0]);
      //echo '<div>';
      //print_r($bb_open);
      //echo '</div>';
     }
     else {$result = '<div>INVALID! element not in white list! == '.$pieces3[0];}
    }
    else {$result = '<div>INVALID! Last BB is opening when should close!</div>';}
   }
   else if ($pieces3[0][0]=='/')
   {
    $pieces4 = explode("/", $pieces3[0]);
    //echo '<div>pieces2 == '.$pieces4[1].'</div>';
 
    // Closing BB Code
    // Compare to last element in $bb_code opened tag, if not a match then code is invalid!
    //if ($pieces4[1]==end($bb_open)) {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == VALID!</div>';}
    //else {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == INVALID!</div>';}
    
    if ($pieces4[1]!=end($bb_open)) {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == INVALID!</div>';}
    array_pop($bb_open);
   }
  }
  //echo '<br />';
  $i++;
 }
 if (!isset($result)) {$result = 'valid';}
 return $result;
}

Re: BB code to XML as application/xhtml+xml

Posted: Wed Aug 19, 2009 8:41 am
by jackpf
Hmm....I'll test it out in a sec. Just woke up.

But yeah, in regard to my regex allowing any tag, just replace then .*? in the first square brackets with a list of allowed tags seperated by |.

So:
(i|b|u)

See, this is why I love regex, even though it can be a biatch at times... :)

Re: BB code to XML as application/xhtml+xml

Posted: Wed Aug 19, 2009 12:43 pm
by JAB Creations
I'm not sure how to correctly apply that filter...I'm just not in regex mode right now. :P

Unless there are any remaining issues I think I'll create a second function to verify the structure of img and url bb code.

*Edit* - Pizza for post following this one...

$multiple_line = "stuff[b]stuff![/b]\n[i]stuff[/i][url=http://www.example.com/]example[/url][b]stuff![/b]\nstuff[b]stuff![/b]\n";

Re: BB code to XML as application/xhtml+xml

Posted: Wed Aug 19, 2009 2:11 pm
by JAB Creations
In this build I've added basic though not fool-proof img and url validation. I implemented a basic substr function to detect if a piece was 'http://' or not. While I'm mostly interested in anything that has the potential to break XHTML code after the bb code is converted naturally though I'm still open to all suggestions. :)

The only thing I can think of right now at this point is implementing blockquote element via quote, naturally for inline quote it'll be [ q ] for <q> element. I'll have to implement it so only the first and last bb code tags in the loop are quote tags. I'm not sure how to implement cite, I've never really used or styled it with CSS though I'm thinking of something along the lines of how a label element is positioned relative to a fieldset, only I'd likely have it on the bottom-right versus the top-left.

Any way here is the latest build...

Code: Select all

<?php
// For pizza please see post just before this one.
$pieces1 = explode("\n", $multiple_line);
 
 
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div><br /><br /><br />';
 
 
foreach($pieces1 as $key => $value)
{
 //echo '<div>'.$value.'</div>';
 echo '<div>'.bb_validator_1($value).'</div><br />';
}
 
 
function bb_validator_1($pizza1)
{
 $pieces1 = explode("[", $pizza1);
 
 //echo '<div><pre>';
 //print_r($pieces1);
 //echo '</pre></div>';
 
 $c = count($pieces1);
 $i = '1';
 $bb_open = array();
 $bb_allowed = array('b','color','i','img','link','q','quote','size','u','url');
 
 foreach($pieces1 as $key => $value)
 {
  //echo '<div>i = '.$i.' and c = '.$c.'</div>';
  $pieces2 = explode("]", $value);
  //echo '<div>pieces1 == '.$pieces2[0].'</div>';
  //echo count($pieces2);
 
  $pieces3 = explode("=", $pieces2[0]);
  
  // img and url validation!!!!!!!!!!!
  if (count($pieces3)=='2')
  {
   //echo '<div>'.$pieces3[0].'</div>';
   if (substr($pieces3[1], 0,7)!='http://')
   {
    if ($pieces3[0]=='img') {$result = 'invalid img url'; break;}
    else if ($pieces3[0]=='url') {$result = 'invalid url'; break;}
   }
  }
 
  if (!empty($pieces3[0]) && count($pieces2)=='2')
  {
   if ($pieces3[0][0]!='/')
   {
    // Opening BB Code
    // Add BB code to end of array to be compared with next closing BB code.
    if ($i!=$c)
    {
     if (in_array($pieces3[0],$bb_allowed))
     {
      array_push($bb_open,$pieces3[0]);
      //echo '<div>';
      //print_r($bb_open);
      //echo '</div>';
     }
     else {$result = '<div>INVALID! element not in white list! == '.$pieces3[0]; break;}
    }
    else {$result = '<div>INVALID! Last BB is opening when should close!</div>'; break;}
   }
   else if ($pieces3[0][0]=='/')
   {
    $pieces4 = explode("/", $pieces3[0]);
    //echo '<div>pieces2 == '.$pieces4[1].'</div>';
 
    // Closing BB Code
    // Compare to last element in $bb_code opened tag, if not a match then code is invalid!
    //if ($pieces4[1]==end($bb_open)) {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == VALID!</div>';}
    //else {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == INVALID!</div>';}
    
    if ($pieces4[1]!=end($bb_open)) {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == INVALID!</div>'; break;}
    array_pop($bb_open);
   }
  }
  //echo '<br />';
  //echo '<div>'.$i.'</div>';
  $i++;
 }
 if (!isset($result)) {$result = 'valid';}
 return $result;
}
?>

Re: BB code to XML as application/xhtml+xml

Posted: Wed Aug 19, 2009 7:05 pm
by JAB Creations
I've implemented a basic feature to ensure bb quote tags (which will be converted in to blockquote elements) aren't nested inside of inline tags to be converted to inline elements.

To avoid making a double post each time just do a find and replace on double underscores with [ and double plus symbols with ] on the commented out string on line three below.

Code: Select all

<?php
//Do a find and replace on double underscores with [ and double plus symbols with ]
//$multiple_line = "stuff__b++stuff!__/b++\n__i++__b++stuff__/b++__/i++__quote++__quote++__quote++__b++__quote++stuff__/quote++__/b++__/quote++__/quote++__/quote++__url=http://www.example.com/++example__/url++__b++stuff!__/b++\nstuff__b++stuff!__/b++\n";
$pieces1 = explode("\n", $multiple_line);
 
 
echo '<div><pre>';
print_r($pieces1);
echo '</pre></div><br /><br /><br />';
 
 
foreach($pieces1 as $key => $value)
{
 //echo '<div>'.$value.'</div>';
 echo '<div>'.bb_validator_1($value).'</div><br />';
}
 
 
function bb_validator_1($pizza1)
{
 $pieces1 = explode("[", $pizza1);
 
 //echo '<div><pre>';
 //print_r($pieces1);
 //echo '</pre></div>';
 
 $c = count($pieces1);
 $i = '1';
 $bb_open = array();
 $bb_allowed = array('b','color','i','img','link','q','quote','size','u','url');
 
 foreach($pieces1 as $key => $value)
 {
  //echo '<div>i = '.$i.' and c = '.$c.'</div>';
  $pieces2 = explode("]", $value);
  //echo '<div>pieces1 == '.$pieces2[0].'</div>';
 
  // Allow blockquote nesting though prevent inline elements from having blockquotes nested within them...
  if ($pieces2[0]=='quote')
  {
   foreach($bb_open as $key1 => $value1)
   {
    //echo '<div>val1 = '.$value1.'</div>';
    if ($value1!='quote') {$result = 'quote block element nested inside of inline element!'; break;}
   }
  }
 
  $pieces3 = explode("=", $pieces2[0]);
 
  // img and url validation!!!!!!!!!!!
  if (count($pieces3)=='2')
  {
   //echo '<div>'.$pieces3[0].'</div>';
   if (substr($pieces3[1], 0,7)!='http://')
   {
    if ($pieces3[0]=='img') {$result = 'invalid img url'; break;}
    else if ($pieces3[0]=='url') {$result = 'invalid url'; break;}
   }
  }
 
  if (!empty($pieces3[0]) && count($pieces2)=='2')
  {
   if ($pieces3[0][0]!='/')
   {
    // Opening BB Code
    // Add BB code to end of array to be compared with next closing BB code.
    if ($i!=$c)
    {
     if (in_array($pieces3[0],$bb_allowed))
     {
      array_push($bb_open,$pieces3[0]);
      //echo '<div>';
      //print_r($bb_open);
      //echo '</div>';
     }
     else {$result = '<div>INVALID! element not in white list! == '.$pieces3[0]; break;}
    }
    else {$result = '<div>INVALID! Last BB is opening when should close!</div>'; break;}
   }
   else if ($pieces3[0][0]=='/')
   {
    $pieces4 = explode("/", $pieces3[0]);
    //echo '<div>pieces2 == '.$pieces4[1].'</div>';
 
    // Closing BB Code
    // Compare to last element in $bb_code opened tag, if not a match then code is invalid!
    //if ($pieces4[1]==end($bb_open)) {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == VALID!</div>';}
    //else {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == INVALID!</div>';}
   
    if ($pieces4[1]!=end($bb_open)) {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == INVALID!</div>'; break;}
    array_pop($bb_open);
   }
  }
  //echo '<br />';
  //echo '<div>'.$i.'</div>';
  $i++;
 }
 if (!isset($result)) {$result = 'valid';}
 return $result;
}
?>

Re: BB code to XML as application/xhtml+xml

Posted: Thu Aug 20, 2009 11:12 pm
by JAB Creations
Since my code contains BB code I'm just posting the code at the bottom of my post.

So this pretty much does everything so far with only a couple notes.

First I'm not wild about implementing font size with quotes, I'll likely look at existing implementations and already have looked at how phpBB implements it. The converter function does have it's limitations even after implementing it to execute multiple passes so I'm looking to make the more commonly typed BB codes work as presumed and BB codes that are rarely typed to be subjective to the replacement function and thus how it's implemented.

Also how it handles BB to XHTML code element needs a lot of attention. I've thrown in a pre element though currently if there are break lines between code elements it'll either fail or do something funky. I'm going to look in to existing PHP functions to support advanced conversion of code such as how phpBB forums here work with code.

However beyond the font size and code elements I haven't been able to find any bugs. It could definitely use more polish and perhaps better error handling though all good things with time.

Here is my latest build...

_______________________________

<?php
//$multiple_line = "stuff[b]stuff![/b]\n[i][b]stuff[/b][/i][quote][quote][quote][quote][url=http:///www.jabcreations.com/]stuff[/url][/quote][/quote][/quote][/quote][url=http://www.example.com/]example[/url][b]stuff![/b]\nstuff[b]stuff![/b]\n";
$multiple_line = '<a href="stuff">stuff!</a>[quote][size="24"][color="red"]big red text![/color][/size] [color="orange"]orange text![/color] link = [url=http://forums.devnetwork.net/]DevNetwork[/url][img]styles/subsilver2/imageset/site_logo.gif[/img][b]bold text![/b][i]italic![/i][/quote]'."\n".'stuff!';
//$multiple_line = '[url=http://www.jabcreations.com]JAB Creations[/url][img]styles/subsilver2/imageset/site_logo.gif[/img]';
//$multiple_line = '[url=http://www.jabcreations.com][img]styles/subsilver2/imageset/site_logo.gif[/img][/url]';
echo bb_1($multiple_line);

function bb_1($pizza1)
{
 $pieces1 = explode("\n", $pizza1);
 
 foreach($pieces1 as $key => $value)
 {
  $result = bb_2_validator($value);
  if ($result!='valid') {return $result; break;}
 }
 if ($result=='valid') {$result = bb_3_xhtml($pizza1);}
 echo $result;
}
 
 
function bb_2_validator($pizza1)
{
 $pieces1 = explode("[", $pizza1);
 
 //echo '<div><pre>';
 //print_r($pieces1);
 //echo '</pre></div>';
 
 $c = count($pieces1);
 $i = '1';
 $bb_open = array();
 $bb_allowed = array('b','code','color','i','img','q','quote','size','u','url');
 
 foreach($pieces1 as $key => $value)
 {
  //echo '<div>i = '.$i.' and c = '.$c.'</div>';
  $pieces2 = explode("]", $value);
  //echo '<div>pieces1 == '.$pieces2[0].'</div>';
 
  // Allow blockquote nesting though prevent inline elements from having blockquotes nested within them...
  if ($pieces2[0]=='quote')
  {
   foreach($bb_open as $key1 => $value1)
   {
    //echo '<div>val1 = '.$value1.'</div>';
    if ($value1!='quote') {$result = 'quote block element nested inside of inline element!'; break;}
   }
  }
 
  $pieces3 = explode("=", $pieces2[0]);
 
  // img and url validation!!!!!!!!!!!
  if (count($pieces3)=='2')
  {
   //echo '<div>'.$pieces3[0].'</div>';
   if (substr($pieces3[1], 0,7)!='http://')
   {
    if ($pieces3[0]=='img') {$result = 'invalid img url'; break;}
    else if ($pieces3[0]=='url') {$result = 'invalid url'; break;}
   }
  }
 
  if (!empty($pieces3[0]) && count($pieces2)=='2')
  {
   if ($pieces3[0][0]!='/')
   {
    // Opening BB Code
    // Add BB code to end of array to be compared with next closing BB code.
    if ($i!=$c)
    {
     if (in_array($pieces3[0],$bb_allowed))
     {
      array_push($bb_open,$pieces3[0]);
      //echo '<div>';
      //print_r($bb_open);
      //echo '</div>';
     }
     else {$result = '<div>INVALID! element not in white list! == '.$pieces3[0]; break;}
    }
    else {$result = '<div>INVALID! Last BB is opening when should close!</div>'; break;}
   }
   else if ($pieces3[0][0]=='/')
   {
    $pieces4 = explode("/", $pieces3[0]);
    //echo '<div>pieces2 == '.$pieces4[1].'</div>';
 
    // Closing BB Code
    // Compare to last element in $bb_code opened tag, if not a match then code is invalid!
    //if ($pieces4[1]==end($bb_open)) {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == VALID!</div>';}
    //else {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == INVALID!</div>';}
   
    if ($pieces4[1]!=end($bb_open)) {$result = '<div>'.$pieces4[1].' == '.end($bb_open).' == INVALID!</div>'; break;}
    array_pop($bb_open);
   }
  }
  //echo '<br />';
  //echo '<div>'.$i.'</div>';
  $i++;
 }
 if (!isset($result)) {$result = 'valid';}
 return $result;
}
 
 
function bb_3_xhtml($text0)
{
 $bb1 = array('<', '>');
 $xml1 = array('<', '>');
 
 
 $bb2 = array(
  '[b]','[/b]',
  '[syntax=php]<div class=\"text\" id=\"{CB}\" style=\"font-family: monospace;\"><ol><li style=\"\" class=\"li1\">','</li></ol></div>[/syntax]',
  '[color="','[/color]','"]',
  '[i]','[/i]',
  '[q]','[/q]',
  '[u]','[/u]',
  '[quote]', '[/quote]',
  //'[list]', '[*]', '[/list]',
 );
 
 $xml2 = array(
  '<b>','</b>',//'<span class="b">','</span>',
  '<code><pre>','</pre></code>',
  '<span style="color: ', '</span>',';">',
  '<i>','</i>',//'<span class="i">','</span>',
  '<q>','</q>',
  '<u>','</u>',//'<span class="u">','</span>',
  //'<ul>', '<li>', '</ul>',
  "\n<blockquote>", "<blockquote>\n",
 );
 
 $bb3 = array('[img]','[/img]');
 $xml3 = array('<img alt="" src="', '" />');
 
 
 $bb4 = array('[size="','[/size]','"]');
 $xml4 = array('<span style="font-size: ', '</span>','px;">');
 
 
 $bb5 = array('[url=','[/url]',']');
 $xml5 = array('<a class="icon external" href="', '</a>','" rel="nofollow" tabindex="3">');
 
 
  $text1 = str_replace($bb1, $xml1, $text0);
  $text2 = str_replace($bb2, $xml2, $text1);
  $text3 = str_replace($bb3, $xml3, $text2);
  $text4 = str_replace($bb4, $xml4, $text3);
  $text5 = str_replace($bb5, $xml5, $text4);
 
  $result = bb_4_n2p($text5);
  return $result;
}
 
 
function bb_4_n2p($pizza)
{
 $pieces = explode("\n",$pizza);
 
 foreach($pieces as $key => $value) {if ($value=="") {unset($pieces[$key]);}}
 
 foreach($pieces as $key => $value)
 {
  $bb_q = explode("<blockquote>",$value);
  $bb_c = explode("<code>",$value);
 
  if (count($bb_c)=='1' && count($bb_q)=='1') {$result .= '<p>'.$value."</p>\n\n";}
  else {$result .= $value."\n\n";}
 }
 
 return $result;
}
?>

Re: BB code to XML as application/xhtml+xml

Posted: Fri Aug 21, 2009 6:28 am
by jackpf
Nice one. I must admit I'm surprised.