Page 1 of 2

Is this basic bbcode safe?

Posted: Thu Aug 23, 2007 4:36 am
by WaldoMonster
I found this code example on: http://www.skoom.de/tutorials/php/bbcode-to-html.xtc
Is it safe enough to prevent XSS injections when using it this way?

Code: Select all

<?php
$post = $_POST['post'];
echo bbcode(htmlentities($post));

function bbcode ($entry) {
$entry = eregi_replace("\[br\]","<br>",$entry);
$entry = eregi_replace("\[b\]([^\[]+)\[/b\]","<b>\\1</b>",$entry);
$entry = eregi_replace("\[i\]([^\[]+)\[/i\]","<i>\\1</i>",$entry);
$entry = eregi_replace("\[u\]([^\[]+)\[/u\]","<u>\\1</u>",$entry);
$entry = eregi_replace("\[img\]([^\[]+)\[/img\]","<img src=\"\\1\" border=\"0\">",$entry);
$entry = eregi_replace("\[mail\]([^\[]+)\[/mail\]","<a href=\"mailto:\\1\">\\1</a>",$entry);
$entry = eregi_replace("\[url\]([^\[]+)\[/url\]","<a href=\"\\1\" target=\"_blank\">\\1</a>",$entry);
$entry = eregi_replace("\[url=\"([^\"]+)\"]([^\[]+)\[/url\]","<a href=\"\\1\" target=\"_blank\">\\2</a>",$entry);
return $entry; }
?>

Posted: Thu Aug 23, 2007 8:40 am
by superdezign
Your only worries in XSS here would be the 'src' attribute. Using it without validating it will inevitably be your downfall.

Why does this code use ereg? And, FYI, it will fail miserably on nested tags.

Posted: Thu Aug 23, 2007 9:07 am
by Oren
You are mixing 2 things here WaldoMonster. Your parser should not care for XSS, and your XSS defense layer should not care for the parser either.

Posted: Thu Aug 23, 2007 10:09 am
by WaldoMonster
superdezign wrote:Your only worries in XSS here would be the 'src' attribute. Using it without validating it will inevitably be your downfall.

I have used htmlentities before calling the bbcode.
Can you give an example how this can be abused?
Loading an image from another server can't hurt or can it?
superdezign wrote:Why does this code use ereg?

Just cut and past the code from an example page.
Is it better to use preg_replace?
superdezign wrote:And, FYI, it will fail miserably on nested tags.

I think this can be fixed by replacing ([^\[]+) by (.+)

Posted: Thu Aug 23, 2007 10:34 am
by superdezign
WaldoMonster wrote:I have used htmlentities before calling the bbcode.
Can you give an example how this can be abused?
Loading an image from another server can't hurt or can it?
The 'src' attribute doesn't discriminate.
WaldoMonster wrote:Just cut and past the code from an example page.
Is it better to use preg_replace?
Yes.
WaldoMonster wrote:I think this can be fixed by replacing ([^\[]+) by (.+)
That'll make it better (or worse, since it's greedy), but it'll still have problems with nested tags. I used to do it that way for my blog, and ran into problem after problem.

Posted: Thu Aug 23, 2007 11:01 am
by WaldoMonster
Thanks for the help superdezign.
superdezign wrote:The 'src' attribute doesn't discriminate.
What do you mean with "doesn't discriminate"?

Posted: Thu Aug 23, 2007 11:19 am
by superdezign
The 'src' attribute doesn't validate the type of file that is put into it. Just because it's in an image tag doesn't mean it has to be an image.

Posted: Thu Aug 23, 2007 5:09 pm
by WaldoMonster
Thanks for the feedback so far.
Now I have enough information to make a start.
I will come back to this topic later.

Posted: Fri Aug 24, 2007 6:47 am
by WaldoMonster
I think the code is now working very well.
The bbcode() function is made for not encoded data like post or get data.

The bburl() function need to be updated somehow.
I think the url can be reconstructed with the parse_url() function
in combination with rawurlencode().

Code: Select all

<?php
function bbcode($entry)
{
$bbcode = array(
	'/\[b\](.*?)\[\/b\]/s',
	'/\[i\](.*?)\[\/i\]/s',
	'/\[li\](.*?)\[\/li\]/s',
	'/\[ul\](.*?)\[\/ul\]/s',
	'/\[br\]/s');
$html = array(
	'<strong>$1</strong>',
	'<em>$1</em>',
	'<li>$1</li>',
	'<ul>$1</ul>',
	'<br>');

$entry = htmlentities($entry);
$entry = preg_replace($bbcode, $html, $entry);
$entry = preg_replace_callback("/\[url=(.*?)\](.*?)\[\/url\]/s", 'bburl', $entry);
return $entry;
}

function bburl($maches)
{
$url = html_entity_decode($maches[1]);
$url = rawurlencode($url);
$url = str_replace('%3A%2F%2F', '://', $url);
$url = str_replace('%2F', '/', $url);
$url = str_replace('%3F', '?', $url);
$url = str_replace('%3D', '=', $url);
$url = str_replace('%26', '&', $url);
return '<a href="' . $url . '">' . $maches[2] . '</a>';
}

echo bbcode('
normal[br]
[b]bold[/b][br]
[i]italic[/i][br]
[i]italic [b]bold + italic[/b] italic[/i][br]
[ul]
	[li]one[/li]
	[li]two[/li]
[/ul]
[url=http://www.test.com/test.php?action=action&option=option%20space]test[/url]');
?>

Posted: Fri Aug 24, 2007 7:13 am
by superdezign
WaldoMonster wrote:The bburl() function need to be updated somehow.
I think the url can be reconstructed with the parse_url() function
in combination with rawurlencode().
Why do you feel the need to URL-encode a URL?

Posted: Fri Aug 24, 2007 7:58 am
by WaldoMonster
superdezign wrote:Why do you feel the need to URL-encode a URL?
Htmlentities() is not the right way to encode a url.
That is why I decoded it and then encode it with rawurlencode().
Without rawurlencode() it is possible to close the href tag and add for example a mouseover with javascript code.

Posted: Fri Aug 24, 2007 8:08 am
by superdezign
Okay then. You should use strtr() instead of all of those str_replace() calls.

Posted: Sat Aug 25, 2007 2:41 am
by WaldoMonster
WaldoMonster wrote:
superdezign wrote:Why do you feel the need to URL-encode a URL?
Htmlentities() is not the right way to encode a url.
That is why I decoded it and then encode it with rawurlencode().
Without rawurlencode() it is possible to close the href tag and add for example a mouseover with javascript code.
In second thought it isn’t needed to encode the url.
I send the bbcode with a hidden POST form, in this way the url won’t be decoded.
Than I can use it this way:

Code: Select all

<?php
function bbcode($string)
{
$bbcode = array(
	'/\[br\]/s',
	'/\[b\](.*?)\[\/b\]/s',
	'/\[i\](.*?)\[\/i\]/s',
	'/\[li\](.*?)\[\/li\]/s',
	'/\[ul\](.*?)\[\/ul\]/s',
	'/\[url=(.*?)\](.*?)\[\/url\]/s');
$html = array(
	'<br>',
	'<strong>$1</strong>',
	'<em>$1</em>',
	'<li>$1</li>',
	'<ul>$1</ul>',
	'<a href="$1">$2</a>');
return preg_replace($bbcode, $html, htmlentities($string));
}

echo bbcode(' 
normal[br] 
[b]bold[/b][br] 
[i]italic[/i][br] 
[i]italic [b]bold + italic[/b] italic[/i][br] 
[ul] 
        [li]one[/li] 
        [li]two[/li] 
[/ul] 
[url=http://www.test.com/test.php?action=action&option=option%20space]test[/url]');
?>php
Normaly bbcode uses these tags for a list:

Code: Select all

[list]
[*]Red
[*]Blue
[*]Yellow
[/list]
How can I implement it that way instead of the [ul][/ul][li][/li] tags?

Posted: Sat Aug 25, 2007 3:33 am
by WaldoMonster
WaldoMonster wrote:Normaly bbcode uses these tags for a list:

Code: Select all

[list]
[*]Red
[*]Blue
[*]Yellow
[/list]
How can I implement it that way instead of the [ul][/ul][li][/li] tags?
When not using the </li> close tag than it is easy.
Someone got a solution with the </li> close tag?

Code: Select all

<?php
function bbcode($string)
{
$bbcode = array(
	'/\[br\]/s',
	'/\[b\](.*?)\[\/b\]/s',
	'/\[i\](.*?)\[\/i\]/s',
	'/\[list\](.*?)\[\/list\]/s',
	'/\[\*\]/s',
	'/\[url=(.*?)\](.*?)\[\/url\]/s');
$html = array(
	'<br>',
	'<strong>$1</strong>',
	'<em>$1</em>',
	'<ul>$1</ul>',
	'<li>',
	'<a href="$1">$2</a>');
return preg_replace($bbcode, $html, htmlentities($string));
}
?>

Posted: Sat Aug 25, 2007 5:56 am
by feyd
Pull the whole
  • in as a string and parse it separately then replace as needed.