String Value Object for Security.

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

String Value Object for Security.

Post by Ollie Saunders »

mods: this is not a duplicate post, I have deleted the original as code critique was the correct board.

I've been toying with the idea of using the value object pattern to ensure my strings are always escaped when they should be but never twice. The problem I find is that escaping at the last possible moment e.g. view layer of MVC means you can't preprocess anything with HTML formatting and escaping as early as possible means length calculations and the like are out of wack. I don't particularly want to decode HTML entities because I consider this an expensive action that can be avoided so my solution is something where tracking is possible. I'm not really sure what the implications/limitations of this are so comment away.

Outputs:

Code: Select all

<a><b>k
Code:

Code: Select all

<?php
class Str
{
    private $_string = '';
    private $_destination;
    public function __construct($stringData, Destination_Interface $destination)
    {
        $this->_string = $stringData;
        $this->_destination = $destination;
    }
    public function __tostring()
    {
        return $this->_destination->clean($this);
    }
    public function length()
    {
    	return strlen($this->_string);
    }
    public function getRaw()
    {
    	return $this->_string;
    }
    public function setRaw($raw)
    {
    	$this->_string = $raw;
    }
    public function append(self $str)
    {
    	$new = clone $this;
    	$new->setRaw($this->getRaw() . $str->getRaw());
    	return $new;
    }
}
Interface Destination_Interface
{
    function clean(Str $str);
}
class HtmlDest implements Destination_Interface
{
    public function clean(Str $str)
    {
    	return htmlspecialchars($str->getRaw(), ENT_QUOTES);
    }
}


$a = new Str('<a>', new HtmlDest());
$b = new Str('<b>k', new HtmlDest());
echo $a->append($b);
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Having a class that represents strings sets off warning bells in my head, I think the reason is because they offer abstraction at too fine a granularity, which causes performance problems and makes them clunky to use. While your length concerns are well-founded, I don't think creating a string class will solve that problem.

I will stew about this a little and get back to you on it.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Those are precisely the reasons I wanted some opinion on this.
I will stew about this a little and get back to you on it.
Great.
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

You need to remove the setRaw()/append() method for starters - a Value Object represents a specific value with a specific type (i.e. the class type). Since the Value is immutable, the class should be immutable also - i.e. you can't change the data once the object is constructed.

If you need that functionality, then the pattern isn't a Value Object - but more a generic Container for accumulating String values.

On the when of escaping - it always occurs as late as possible. Because of this the unescaped version should be available up to the last moment. I don't see any problem here - if the timing is correct then you apply escaping to the raw data if required easily enough. Using a container object to package all this just seems to be patching the symptom, not the underlying problem.
Begby
Forum Regular
Posts: 575
Joined: Wed Dec 13, 2006 10:28 am

Post by Begby »

I don't think that PHP is ready for a class like this. It would be incredibly useful if operator and function overloading were allowed, but I have a problem if the implementation of using strings within code is not standardized (i.e. you have to call a bunch methods on it).

If you could do this in php it would be great

Code: Select all

$str1 = new String() ;
$str2 = new String() ;


// This is not gay
// overloaded '=' operator
$str1 = 'My Name is' ;
$str2 = 'fred' ;

// overloaded .= operator
$str1 .= ' '.$str2 ;


// This is gay
$str1.setRaw('My Name is') ;
$str2.setRaw('fred') ;

$str1.append(' '.$str2->toString()) ;
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Thanks for the comments guys.
You need to remove the setRaw()/append() method for starters - a Value Object represents a specific value with a specific type (i.e. the class type). Since the Value is immutable, the class should be immutable also - i.e. you can't change the data once the object is constructed.
What exactly is wrong with these? Esp. with append where I return a new object in order to avoid "side effects". And while we're at it what exactly what is so bad about side effects? Do you know any examples where there is one version that uses side-effects and another that does not
If you need that functionality, then the pattern isn't a Value Object - but more a generic Container for accumulating String values.
I tried to find some documentation on the value object pattern online but I couldn't find nothing of real use and I'd left my PHP Patterns Book at work, although I'm slowly beginning to lose faith and interest in that book anyway. So if you can point me to a source of information on the value object pattern I would appreciate it. That said, I have no self imposed rule about obeying the pattern so if it works I'll do it.
On the when of escaping - it always occurs as late as possible.
It does or it should? The problem with late escaping is that you cannot pre-process strings with HTML because they will end up escaped later and when most PHP applications depend on this very ability it becomes a bit of a problem.
I don't see any problem here - if the timing is correct then you apply escaping to the raw data if required easily enough.
Well there are more reasons for a class like this as well. I mean with a class like this + some type hinting you can be sure that things are secure by default, you won't ever forget to escape something. Also doesn't it make OO sense to group these things together/
Using a container object to package all this just seems to be patching the symptom, not the underlying problem.
I don't see what you mean. I'm going to have to ask you to qualify that in a lot more detail.

I don't think that PHP is ready for a class like this
Well I'm not going to wait until it is. This isn't PHP's style, so I don't think operating overloading will ever happen. C++ is the language for this sort of thing. In fact it seems like C++ was specifically designed for the value object pattern in places. Many people disagree with operator overloading in principle and I can see their point - it can definitely be abused.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Anyone had any new thoughts?
Z3RO21
Forum Contributor
Posts: 130
Joined: Thu Aug 17, 2006 8:59 am

Post by Z3RO21 »

I like the theory behind this class but I agree with out the ability to overload operators it might be more hassle in the end than it is worth.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

Anyone had any new thoughts?
It would be great if destination interface was not fixed on string value creation but was dependent on the context the value was used in. Though I don't see how it would be possible in php.
Begby
Forum Regular
Posts: 575
Joined: Wed Dec 13, 2006 10:28 am

Post by Begby »

If you are going to use a class for it, I don't think it should be setup with delegates. What if I wanted to take a string from a form input, then on the next page both display it in HTML and insert it into a db? I would have to either create two string objects, each with a different delegate, or switch the delegate.

Code: Select all

$str = new Str( $_POST['input'], new HTML_Destination() ) ;
$dbStr = new Str( $_POST['input'], new DB_Destination() ) ;

$this->view()->set('str', $str) ;
$model->str = $dbStr ;
$model->insert() ;

// Or do this

$str = new Str( $_POST['input'], new HTML_Destination() ) ;
$this->view()->set('str', $str) ;
$str->setDestination( new DB_Destination() ) ;
$model->str = $str ;
$model->insert() ;

To me this is just too much code to solve a relatively simple problem.

Here are some ideas I came up with...

Code: Select all

$str = new Str( $_POST['input'] ) ;

$this->view->set('str', $str->escape_html()) ;
$model->str = $str->escape_mysql() ;
$model->insert() ;


// maybe make it work like this, this could be accomplished by always storing a copy of the original string, then have a
// holder for an escaped string, toString() would return what was escaped, unescape() would restore the original string

$str = new Str('&nbsp;') ;

$str->escape_html() ;
echo $str ; // &nbsp;

$str->escape_mysql() ;
echo $str ; // &nbsp\;

$str->unescape() ;
echo $str ; // &nbsp;


// Maybe have it be pluggable by overloading __call(), you would need to set this up
// so you could register the plugins at load time instead of in your logic

interface IStr_Escape {....}

class Str_Escape_HTML implements IStr_Escape {...}

class Str_Escape_Custom implements IStr_Escape {...}

$str = new Str('&nbsp;') ;

// This would need to magically happen in some config somewhere
$str->register_escape('HTML', new Str_Escape_HTML() ) ;
$str->register_escape('Custom', new Str_Escape_Custom() ) ;

$str->escape_HTML() ;
$str->escape_Custom() ;

One reason that I would like to see a non-crappy string class though is mainly because of code insight in Zend and Eclipse. Its amazing what a time saver that is not having to remember a plethora of functions that are not named consistently (and god do I hate the inconsistency of php functions). I go out of my way to put things in methods just because of that. I even make registries have a method for each of the common items I put in the registry so that I can get the type inline and proper code hints.

A float or integer class would be killer too

Code: Select all

// I would use this, I don't care if its redundant or slow, it would just save me a lot of time and is very readable
$str = new Str('Fred') ;
$str->length() ;
$newStr = $str->substr(2,3) ;
$lower = $str->toLower() ;
$upper = $lower->toUpper() ;
$upper->escape_HTML() ;

// this might be handy if it was intuitive enough
$float = new float( 5.329 ) ;
$int = $float->ceil() ;
$int2 = $float->floor() ;

$float = $int->div($int2) ;
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Hmm a lot of people are quite against this. This doesn't surprise me, I was and am still cautious about the implications of this.
But I think I'm going to suck it and see. I've got a small(ish) project I need to do that involves a lot of string manipulation and we'll see how it fairs with that.

I may try those float and integer objects as well actually.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

I've had a few thoughts and actually I don't think this is going to work. Consider the following use case:
  • Get some data from the database. It's tainted.
  • We want to format it to make one half bold and one half normal. HTML has to be used to do this. If the data was "Foo&Bar" the result we are after would be:

    Code: Select all

    <b>Foo</b>&Bar
  • The tags must be maintained but the text between escaped.
The Str object is only going to be capable of giving you

Code: Select all

<b>Foo</b>&Bar
or

Code: Select all

<b>Foo</b>&Bar
never the desired data (with the ampersand escaped).

So what is the solution? DOM possibly?
User avatar
kyberfabrikken
Forum Commoner
Posts: 84
Joined: Tue Jul 20, 2004 10:27 am

Post by kyberfabrikken »

ole wrote:So what is the solution? DOM possibly?
I sound to me like you're formatting your data at the wrong point in your application. The model layer should return semantic data, while it's up to the view layer to format this into a representation (Usually HTML).
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Yeah, I suppose given that, this isn't really necessary.

btw your framework looks cool.
Post Reply