PHP DataObjects: Simplifying Business Logic without SQL

Tutorials on PHP, databases and other aspects of web development. Before posting a question, check in here to see whether there's a tutorial that covers your problem.

Moderator: General Moderators

Post Reply
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

PHP DataObjects: Simplifying Business Logic without SQL

Post by Maugrim_The_Reaper »

I was recently talked into writing an article on building a simple ORM solution in PHP for a local developers group. As my reward is a number of free beers, I managed to finish this lengthy tutorial. So here it is for Devnetwork members to marvel at (and ask pointy questions!). Enjoy...

Introduction


The goal of this tutorial is to demonstrate a system for manipulating a database without writing SQL queries. I introduce a number of concepts including the DataObject and DataAccessObject which together allow developers focus on manipulating data without needing to write, or even know SQL. This kind of system can be referred to as Object Relational Mapping (ORM) where tables and rows in a database are represented by PHP objects which may be used to perform common database operations. It is a very simple system to use in practice, and has become a common component of many popular PHP frameworks.

In the tutorial I implement such a system integrating database abstraction, i.e. the ability to operate on several or more database management systems such as MySQL or PostgreSQL, using ADOdb Lite. The tutorial is not very complex, but does assume you are aware of Object Oriented Programming basics in PHP and are familiar with the basics of using a database in PHP.

So if using a system where you can manipulate a database without writing lines of SQL appeals to you, then read on...:).


General Concepts


The purpose of a DataObject (which can also be termed a TransferObject) can be described quite simply. The quick explanation is that a DataObject stores all the data associated with a table row in a single object which is easily transferred between different application layers. For example, whenever we need to pass database data to another class, layer or library we can do so by passing around our DataObjects.

The DataObject in turn is utilised by a DataAccessObject which performs row operations - the most common of which are SELECTs, INSERTs, UPDATEs and DELETEs. These can also be referred to as CRUD operations (Create, Read, Update and Delete). Such operations can be performed with a minimum of code which does not require writing SQL since SQL use by the programmer is replaced with a shorthand API specific to each database table. In this tutorial we focus solely on implementing CRUD methods - though in a real application you could easily extend the basics with a more complex API specific to your application's requirements.

Such a system of abstracting database operations (which is what we'll implement below) can be referred to as Object Relational Mapping (ORM). Just don't expect the example classes below to solve all your problems; if you utilise foreign key constraints, prepared statements and other semi-advanced features you may require something more heavyweight. The tutorial demonstrates a simple system useful for either small applications, or as a platform to roll your own more feature-packed variant.

If the above sounded like so much gibberish, don't despair. The tutorial will make everything clear.

Those interested in researching Design Patterns useful for this tutorial should look up the ActiveRecord, TransferObject, RowDataGateway, and TableDataGateway patterns for specifics. The tutorial also refers to the Singleton, Registry and Facade patterns in passing. I don't claim to implement patterns strictly - I'll be mixing and matching. A good PHP specific Patterns resource is PHP Patterns if you can ignore the colour scheme.

All code used in this tutorial may be downloaded from: Download. Windows users will need a 3rd party compression utility like WinZip to extract the archive.


A Quick Example


Note: All code is written to be compatible with PHP4 (still very common on shared hosts). It will still work in PHP5, have no fear.

A quick usage example:

Code: Select all

<?php
 
$user = new User();
$user->name = 'Maugrim';
$user->email = 'email@example.com';
$user->www = 'http://blog.quantum-star.com';
$user->save(); // insert data on database table
 
?>
An alternate approach is to make use of getters and setters, without direct access to properties. For PHP5, this typically includes class properties being declared as "private" or if used by child classes "protected".

Code: Select all

<?php
 
$user = new User();
$user->setName('Maugrim');
$user->setEmail('email@example.com');
$user->setWww('http://blog.quantum-star.com');
$user->save(); // insert data on database table
 
?>
Personally I prefer the second approach as the use of setters or getters (which are all object functions) enables some additional work to be performed when setting values. One example of additional work may be to track which values have been changed since the last database update/insert. This can add a way to prevent the creation of duplicate records and/or limit the size of subsequent updates to changed data only. This convention is often debatable - you can implement a similar system without using getters or setters at the cost of more complex legwork behind the scenes to manage updates, etc.

At this point - alternatives! In case you're fascinated even after following the tutorial there are many similar and hugely more in-depth and feature packed libraries implementing DataObjects or ORM. Some of the popular ones include Propel, PEAR::DB_DataObject, EZPDO and MetaStorage. There are many others out there also. The one implemented here is, in a word, simplistic. But it gets the point across, hopefully, and it is actually used in real applications.


Introducing the DataObject


The DataObject is, in its simplest form, a class containing a list of object properties. It will also contain relevant getters and setters as below. Most of you will be familiar with similar examples from your first excursions into OOP:

Code: Select all

<?php
 
class User {
 
    var $name;
    var $email;
    var $www;
 
    function User() {}
 
    function setName($value) { $this->name = $value; }
    function getName() { return $this->name }
 
    function setEmail($value) { $this->email = $value; }
    function getEmail() { return $this->email }
 
    function setWww($value) { $this->www = $value; }
    function getWww() { return $this->www }
 
}
 
?>
As a sidenote, a class of this structure (when it contains many extra fields) will quickly become an extremely tedious file to both hand type and then maintain. A fair approach to avoiding such tedious typing is to automatically generate the class file based on your database table's structure. This is actually quite simple to do and may save you from hours of boredom. Most databases offer a syntax to gather information on field names, types, etc. Armed with this information, you can utilise a standard naming scheme and generate such DataObjects as needed. This is the "convention over configuration" method at work. Certain alternatives, such as Ruby On Rails' ActiveRecord even do this on the fly during execution (though honestly I prefer to cut out such repetitive performance hits). Many RoR inspired PHP implementations do the same.


The DataObject: Extra Functionality


We're not quite finished yet. As you may have noticed in the usage example, there is one method to save data, namely save(), but no specific functions which differentiate between the SQL CREATE and UPDATE commands. To reduce complexity in practice, we will let the DataObject be aware of whether its data exists in the database by adding an extra property, $exists, to track this state. This can be used to determine whether to UPDATE or CREATE a row.

Additionally, to limit the fields updated to only those values actually changed, we will also track which properties have been altered. This means our DataObject will also store an array of changed field names. And, as if that were not enough, we will also make the DataObject aware of the table's primary key, i.e. the combination of fields which identify it as a unique row. In practice, this tutorial assumes all rows will have a single unique field (the primary key), typically a row id of some description. In reality it's just as valid to have a primary key which is a combination of two or more fields whose combined values are unique to a row.

We also need something to store the primary key value and the field name of the key. So we add a new field called user_id which is the primary key for rows in our "user" table, and a $primaryKey property to store the field name. Additionally we may recognise that it might be useful to keep all our row field values segregated from the other class properties (simply so we can access them separately and in isolation). In the example below, row data is now added to a $data array and we add a small method call to return this array if required.

Adding in these extra facets, we arrive at a modified class as follows:

Code: Select all

<?php
 
class User {
 
    var $primaryKey = 'user_id';
    var $changedFields = array();
    var $tableName = 'user';
    var $className = 'User';
    var $exists = false;
    var $data = array();
 
    function User() {}
    
 
    function setUserId($value) { $this->data['user_id'] = $value; $this->changedFields[] = 'user_id'; }
    function getUserId() { return $this->data['user_id']; }
 
    function setName($value) { $this->data['name'] = $value; $this->changedFields[] = 'name'; }
    function getName() { return $this->data['name']; }
 
    function setEmail($value) { $this->data['email'] = $value; $this->changedFields[] = 'email'; }
    function getEmail() { return $this->data['email']; }
 
    function setWww($value) { $this->data['www'] = $value; $this->changedFields[] = 'www'; }
    function getWww() { return $this->data['www']; }
 
    function setPrimaryKeyValue($value) { $this->setUserId($value); }
    function getPrimaryKeyValue() { return $this->getUserId(); }
 
    function &getArray() { return $this->data; }
 
}
 
?>
Having all those specific properties like $className and $tableName seem out of place initially since we declare their values with no input - they can however be added during a Code Generator stage (see later) or worked out through some additional PHP logic. Here I just throw them in for brevity. Those with PHP5 can probably work out such things a little easier than those of us restricted to PHP4 compliance. As you can see, I've used the setters to implement a system of tracking changed fields - the setters/getters were useful afterall.


The DataObject: Common Ground


So far we've assembled a simple DataObject. However we will be appending additional requirements to the basic class, and some of these will be common to all DataObjects irrespective of their type (or which table they are related to). Rather than copy such common elements across to all DataObjects as we create them, we should add them to a parent class which is inherited by all DataObjects. This ensures duplicate code is centralised making for easier maintenance. We can call this the DataObjectParent class.

This class will also be useful in other ways later on. As you will see we are using only one DataAccessObject class (a Singleton), which will be injected into all DataObjects for re-use. We also need to track some common data such as whether the DataObject's data exists on the database at the current time or not. We'll also have getters and setters for DataObject properties like the table name, primary key field, etc. See the sample class below. As you will notice I moved some class variables out of the User DataObject (they are common to all DataObjects and hence candidates for the parent class). These will all be inherited by our DataObjects, and therefore accessible from them. (Hint: In PHP5 many of these would be declared as protected properties.)

Obviously, the more functionality we add to a DataObject the more it departs from the TransferObject pattern. Don't worry too much about deviating from a core Pattern - they exist as general ideas, not concrete standards. Soon we'll be capable of doing some basic table manipulation with our DataObjects. As an aside, keep in mind there are many ways of implementing a similar system - some simpler, others far more complex and feature packed...

Code: Select all

<?php
 
class DataObjectParent {
 
    var $exists = false;
    var $data = array();
    var $changedFields = array();
    var $primaryKey;
    var $tableName;
    var $className;
    var $dao;
 
    function DataObjectParent($fields=null) {
        $appHelper =& ApplicationHelper::getInstance();
        $dbConn =& $appHelper->getDA();
        $this->dao =& DataAccess::getInstance($dbConn);
        if($fields)
        {
            $this->import($fields);
        }
    }
 
    function import($fields=null) {
        foreach($fields as $key=>$val)
        {
            $this->setValue($key, $val);
        }
    }
 
    function setValue($key, $val) {
        $this->data[$key] = $val;
    }
 
    function getValue($key) {
        return $this->data[$key];
    }
 
    function &getArray() {
        return $this->data;
    }
 
    function clearData() {
        $this->data = array();
        $this->changedFields = array();
        $this->exists = false;
    }
 
    // ...
 
?>
We've added a few additional methods over what we had in the DataObject class. One of the ways used to let this system operate with little intervention is to ensure we use a naming convention (or we could get stuck with configuration files instead). There are many such conventions, but we'll take the simplest one. Notice that all our $data field names are identical to the field names we expect to see on our database. These each segregate separate words/terms in the name with an underscore. Our related setters and getters, remove the underscore and capitalise the first letter of each term in the fieldname, prepending this with "get" or "set". This is a small example of a naming scheme - but it makes our work simpler when importing data into a DataObject, and setting values. Likewise our DataObjects are similarly named, using the database table name as a basis, hence a "user" table gives rise to a "User" class, and "user_permission" to "UserPermission" (if we created such a table). This is the "Convention over Configuration" principle at work as popularised by Ruby on Rails.

We have two new setValue() and getValue() methods - our DataAccessObject class will use this as they do not alter the changedFields array (the data is not changed, it's exactly as it is on the database - i.e. unchanged). We've added a getArray() method for exporting our row data in array form. The parent's constructor also has an odd reference to an ApplicationHelper class - basically this provides the class with an instance of an ADOdb Lite connection object via getDA(), and is not covered in this tutorial - it's a simple static method which creates an ADOdb Lite connection object, and returns a reference to that object. Replace it at will with your own system.

We can also instantiate this object by passing it an array of row field names and matching values. No particular field order is required for this but the array must contain matching fieldnames and values. We later use this when fetching rows to easily create new DataObjects from the results array of our SELECT queries. Continuing...

Code: Select all

<?php
 
    // ...
 
    function getPrimaryKey() {
        return $this->primaryKey;
    }
 
    function getPrimaryKeyValue() {
        return $this->data[$this->primaryKey];
    }
 
    function setPrimaryKeyValue($value) {
        $this->data[$this->primaryKey] = $value;
    }
 
    function getClassName() {
        return $this->className;
    }
 
    function getTableName() {
        return $this->tableName;
    }
 
    // ...
 
?>
We also add in some extra getters and setters for common class properties such as tableName, primaryKey, etc. In addition we have added two methods for accessing the value of this DataObject's primary key - which is user_id in our example.

Finally, we add methods to set whether a DataObject holds a pre-existing table row (i.e. whether the data exists on the database, or needs to be created), and also a method of fetching the $changedFields array which tells which fields we may need to update on the database during a save() operation.

Code: Select all

<?php
 
    // ...
 
    function getExists() {
        return $this->exists;
    }
 
    function setExists($var = true) {
        $this->exists = $var;
    }
 
    function getChangedFields() {
        $this->changedFields = array_unique($this->changedFields); // remove duplicate values
        return $this->changedFields;
    }
 
    function clearChangedFields() {
        $this->changedFields = array();
    }
 
}
 
?>
We now have a fair portion of our DataObjectParent class written. We need a few more additions because of the design being used, but we'll come to that later on.


The Revised DataObject Class


With the addition of the DataObjectParent class, we can now amend our DataObject class for Users:

Code: Select all

<?php
 
class User extends DataObjectParent {
 
    function User() {
        parent::DataObjectParent(); // run the parent class's constructor
        $this->primaryKey = 'user_id';
        $this->tableName = 'user';
        $this->className = 'User';
    }
    
 
    function setUserId($value) { $this->data['user_id'] = $value; $this->changedFields[] = 'user_id'; }
    function getUserId() { return $this->data['user_id']; }
 
    function setName($value) { $this->data['name'] = $value; $this->changedFields[] = 'name'; }
    function getName() { return $this->data['name']; }
 
    function setEmail($value) { $this->data['email'] = $value; $this->changedFields[] = 'email'; }
    function getEmail() { return $this->data['email']; }
 
    function setWww($value) { $this->data['www'] = $value; $this->changedFields[] = 'www'; }
    function getWww() { return $this->data['www']; }
 
}
 
?>
All we have done is amend the constructor to set property values specific to our DataObject (the actual properties are now declared in the parent class), and added a line to ensure the parent's constructor is called upon instantiation of any DataObjects. All we are then left with are the getters/setters specific to row data. As noted previously, you can replace the in-file table/pk/class values with PHP logic instead of leaving explicit values in there. This class is a prime candidate for code generation - so I left them as is since they require no maintenance if subject to dynamic generation (where they are input as parameters to a generating function).


Writing the DataAccessObject


Since this tutorial is intended to abstract common database operations, we will be taking the option of creating a DataAccessObject capable of both creating and executing generic CRUD (Create, Read, Update, Delete) SQL query strings from any DataObject. This means we will not require CRUD functions specific to any one DataObject. Instead we will have a one-size-fits-all DataAccessObject which can generate SQL from any DataObject it is given. This requirement complicates things a little. Typically we could simply extend the DataObject with our DataAccessObject thus creating a simple hierarchical design with CRUD methods specific to each database table. However since we only have ONE DataAccessObject (or will have!) we need to either:

a) pass all DataAccessObject methods an instance of a specific DataObject to operate with; or
b) extend our DataObject (actually the DataObjectParent) class to utilise methods from a standalone DataAccessObject instance

Here, I opt for option (b); we will write a DataAccessObject which is re-useable, and which is a suitable candidate for being either a Singleton or registered to an Object Registry. We'll use a Singleton for, yes, brevity. That word again... There will be a touch of option (a), but it will be hidden from the end user who is likely only interested in the DataObject API.

The reasoning behind the one-size-fits-all decision is that this design is actually implemented in real life for several open source projects used predominantly on shared hosting services. Such services share memory and processing resources, and so re-using a Singleton is a more efficient memory usage than having to compile a more specific class for every DataObject we use in an application (which could run to quite a number). You can of course just use simple inheritance as I noted in this section's introduction if that's not a concern - it's the typical method in many frameworks.


The Singleton DataAccessObject


The basic skeletal structure of our DataAccessObject will be:

Code: Select all

<?php
 
class DataAccess {
 
    var $db; // instance of [url=http://adodblite.sourceforge.net]ADOdb Lite[/url]
 
    function DataAccess() {}
 
    function &getInstance() {} // singleton method
 
    function getByPk() {}
 
    function save() {}
 
    function delete() {}
 
    function getRow() {}
 
    function getAll() {}
 
    function getChangedFields() {}
 
    function getSetFields() {}
 
}
 
?>
Since we're interested for this tutorial in CRUD, preferably the simple uncomplicated kind, the methods we require are quite intuitive. We have three "get" methods: one for fetching a row based on a primary key, one for fetching a single row based on the current DataObject values (these are fed into a WHERE condition with a LIMIT 1 condition), and the last for fetching all rows containing the current DataObject values.

As you can see I'm already setting up the scene for accomodating WHERE conditions. Basically, setting values to a new DataObject and calling getRow() or getAll() will use those same values to create a WHERE condition for our SQL query. The results are then injected as the new values for the DataObject. getAll() conversely will actually return an array of results (all rows) each of which will be used to create separate DataObjects.

Remember that we are assuming that one DataObject is equivalent to one row on the database. An array of DataObjects we will refer to from now on as an "Object Collection". Just remember its a simple array, with each element being a DataObject of the same type, each representing a unique row of data from our database table. I am not however adding any extra functionality for dealing with foreign keys, relationships or even the SQL LIMIT condition. Those simple open source applications rarely require such and they add extra complexity.

Yes...it's that brevity idea again.


Adding a Database Abstraction library


So we have our DataObject. But we need to flesh out our DataAccessObject so those methods actually do something interesting. Our first assumption is that each method (excluding the singleton getInstance()) will need to be passed a DataObject on which to perform an operation. Now this seems to disagree with our usage example (no parameters) - but we'll get to that later.

Secondly we need to decide how to interact with the database itself. Since most of my personal work involves multiple databases I tend towards using a database abstraction library. This lets me use an application on a few Database types, typically MySQL and PostgreSQL. Here, I will make use of ADOdb Lite which is a smaller, lighter alternative to ADOdb which I find quite useful since I tend to work on open source applications used by people with access to resource-limited shared hosts. ADOdb Lite certainly uses far less memory than big cousin, but its just as intelligent.

The exact same principles however are easily applied if you want to use a very specific PHP library such as mysql, mysqli or PDO. Same goes for any alternate database abstraction libraries like ADOdb, PEAR::DB, etc.


Filling Out The DataAccessObject


DataAccessObject: The Singleton getInstance() and getByPk()


I won't detail the use of the Singleton pattern here - instead see the write up at http://www.phppatterns.com/docs/design/ ... on_pattern . Our DataAccessObject class constructor is currently empty. As we've mentioned we need at least one external resource - namely our ADOdb Lite instance and connection object. We'll simply pass this into the DataAccessObject class when it is being instantiated. Our singleton method simply allows us to re-use a single instance of this object, rather than each DataObject being extended by separate DataAccessObject classes specific to each DataObject. This reduces memory usage - which is one of the requirements I set to be shared-hosting friendly.

Code: Select all

<?php
 
class DataAccess {
 
    var $db;
 
    function DataAccess(&$db) {
        $this->db = $db;
    }
 
    function &getInstance(&$db) {
        static $thisInstance;
        if(is_null($thisInstance))
        {
            $thisInstance =& new DataAccess($db);
        }
        return $thisInstance;
    }
 
    // ...
 
?>
Our getByPk() method will simply fetch a row based on the value of the primary key associated with our selected database table. It accepts a value and fetches the row associated with that primary key value. Here we assume a table will only have one primary key - in real life it can have more... A primary key can be composed of many fields, which combined are unique to a table. $do is a reference to the DataObject which is passed as a parameter to DataAccess methods.

Code: Select all

<?php
    
    // ...
 
    function &getByPk(&$do, $value=null) {
        $sql = 'SELECT * FROM ' . $do->getTableName() . ' WHERE ' . $do->getPrimaryKey() .' = ?';
        if(isset($value))
        {
            $result = $this->db->Execute($sql, array( $value ));
        }
        else
        {
            $result = $this->db->Execute($sql, array( $do->getPrimaryKeyValue() ));
        }
        if(!$result)
        {
            trigger_error($sql . '<br /><br />' . get_class($this) . ': ' . $this->db->ErrorMsg(), E_USER_ERROR);
        }
        if(!empty( $result->fields ))
        {
            $do->import( $result->fields );
            $do->setExists();
        }
        else
        {
            $do->clearData(); // no data if no results 
        }
        $result->Close();
    }
 
    // ...
 
?>
So what's happening in here? Our first method line constructs the SQL query string we intend using. This is a simple SELECT statement. However it also replaces the value of the primary key with a question mark. Since I'm aiming for simplicity, I make use of an ADOdb Lite feature called variable binding (this is distinct from prepared statements). Where a value occurs in an SQL statement, I enter a question mark. When the SQL statement is sent to ADOdb Lite for execution I will also pass an array of the missing values. ADOdb Lite will detect this, and replace all the question marks with the values from the attached array. However in doing so it will also automatically detect the value type and apply appropriate value escaping and quoting for the database currently in use. This use of an automated feature saves me (and you if following this) from a lot of messy, error prone manual escaping and quoting.

You will also note that passing a primary key value is optional. The method will use the passed value, or the value preset in the DataObject - just make sure one option or the other is used. Next we output an error message (fatal user error) if something goes wrong, and finally we fetch the row data and import it into our DataObject before closing out the results (this does not close the connection!). We also call setExists() since this data does exist on the database, and we want the DataObject $exists property to track this state.

This behaviour - importing the row data into the same object we passed to our DataAccessObject class explains how the following works: $user2 is given "name" value of "Feyd" which is used by the getRow() method (see next section) to form a WHERE condition in the SQL statement constructed by getRow(). getRow() fetches the result, and the row data is imported into the $user2 object - no new objects are instantiated.

Code: Select all

<?php
 
// create new user
$user = new User();
$user->setName('Feyd');
$user->setEmail(null);
$user->setWww('http://forums.devnetwork.net');
$user->save();
 
// fetch same user's details
$user2 = new User();
$user2->setName('Feyd');
$user2->getRow(); // imports result data to $user2
echo $user2->getWww(); // http://forums.devnetwork.net
 
?>

DataAccessObject: getRow()


getRow() is used to fetch a single row - the equivalent of using a LIMIT 1 condition in an SQL query. It will return the first result only, irrespective of how many rows match the WHERE condition it uses.

Code: Select all

<?php
 
    // ...
 
    function getRow(&$do) {
        $conditions = $this->getSetFields($do);
        $sql = 'SELECT * FROM ' . $do->getTableName() . ' WHERE ' . implode(' AND ', $conditions['fields']);
        $result = $this->db->SelectLimit($sql, 1, -1, $conditions['values']);
        if(!$result)
        {
            trigger_error($sql . '<br /><br />' . get_class($this) . ': ' . $this->db->ErrorMsg(), E_USER_ERROR);
        }
        if(!empty( $result->fields ))
        {
            $do->import( $result->fields );
            $do->setExists();
        }
        else
        {
            $do->clearData();
        }
        $result->Close();
    }
 
    function getSetFields(&$do) {
        $setFields =& $do->getArray();
        $fields = array();
        $fields['fields'] = array();
        $fields['values'] = array();
        foreach($setFields as $key=>$val)
        {
            $fields['fields'][] = $key . ' = ?';
            $fields['values'][] = $val;
        }
        return $fields;
    }
 
    // ...
 
?>
There is little here beyond what we encountered in getByPk(). getRow does something very similar, applying a WHERE condition, executing the query, and importing the row data. Here however we expect a more complicated WHERE condition with multiple fields. To pick these up we grab the full array of set data ($data in DataObjectParent) and construct two arrays. The first is our field names in the form:

Code: Select all

fieldname = ?
This we recognise as the familiar variable binding format we discussed earlier. The second array is an array of all row data values held by the DataObject. These two arrays are then used to construct a full SQL WHERE condition for our query - from there we are once again in getByPk() territory. Also note we use ADOdb Lite's SelectLimit method which ensures our query uses a LIMIT condition consistent with the current database's syntax.


DataAccessObject: getAll()


Our next method, getAll(), will not import row data into the original DataObject. Instead it will return an array of objects, an Object Collection, each of which will be a DataObject corresponding to a row matching our WHERE condition.

Code: Select all

<?php
 
    // ...
 
    function &getAll(&$do) {
        $conditions = $this->getSetFields($do);
        if(empty($conditions['fields']))
        {
            $sql = 'SELECT * FROM ' . $do->getTableName();
            $result = $this->db->Execute($sql);
        }
        else
        {
            $sql = 'SELECT * FROM ' . $do->getTableName() . ' WHERE ' . implode(' AND ', $conditions['fields']);
            $result = $this->db->Execute($sql, $conditions['values']);
        }
        if(!$result)
        {
            trigger_error($sql . '<br /><br />' . get_class($this) . ': ' . $this->db->ErrorMsg(), E_USER_ERROR);
        }
        $records = $result->GetArray();
        $result->Close();
        if(!empty($records))
        {
            $objectArray = array();
            $class = $do->getClassName();
            foreach($records as $fields)
            {
                $object =& new $class($fields);
                $object->setExists();
                $objectArray[] = $object;
            }
            return $objectArray;
        }
        else
        {
            return false;
        }
    }
 
    // ...
 
?>
Again we see a lot of similarity with getByPk() and getRow(). The main difference is that this time we expect more than one result. On this basis we create a new DataObject for each result, and insert these into a simple array, $objectArray. This array value is then set to the variable (which originally held our DataObject) which we originally used to add values for our WHERE condition.

An example: We set no WHERE condition, i.e. an empty DataObject, to return all possible rows from the user table.

Code: Select all

<?php
 
$user_collection = new User();
$user_collection->getAll();
echo '<strong>List of User Names</strong><br /><br />';
foreach($user_collection as $key=>$user)
{
    echo $user->getName(), '<br />';
}
 
?>
This is not the best way of handling multiple rows, however that "brevity" term is ticking away in my head so you can worry about optimising it with a proper RecordSet at some other time. What's important is that it works. Next you'll be expecting an object for every field - let alone rows....


DataAccessObject: save()


Of course fetching data is great, but we also need to save it occassionally! The save() method has two alternate operations - it will either CREATE or UPDATE a row depending on whether the row data held by the DataObject already exists in the database or not. As you will have noted from examining the get methods earlier - we set the resulting DataObject's $exist flag to true by calling setExists(). In this case, changing the data and calling save() would perform an UPDATE. If you had added data to a new and therefore empty DataObject, save() should execute a CREATE query, and then set the $exists flag to true.

Code: Select all

<?php
    
    // ...
 
    function save(&$do) {
        if($do->getExists()) // $exists = true; UPDATE
        {
            $touched = $this->getChangedFields($do);
            if(empty($touched['fields']))
            {
                // We don't update rows which haven't changed. No changes = no UPDATE!
                return;
            }
            $sql = 'UPDATE ' . $do->getTableName() . ' SET ' . implode(',', $touched['fields'] ) . ' WHERE ' . $do->getPrimaryKey() . ' = ?';
            $touched['values'][] = $do->getPrimaryKeyValue(); // append primary key value to value array
            $result = $this->db->Execute($sql, $touched['values']);
            if(!$result)
            {
                trigger_error($sql . '<br /><br />' . get_class($this) . ': ' . $this->db->ErrorMsg(), E_USER_ERROR);
            }
            $do->clearChangedFields();
        }
        else // $exists = false; CREATE
        {
            $insertArray =& $do->getArray();
            $sql = 'INSERT INTO ' . $do->getTableName() . ' (' . implode(',', array_keys( $insertArray )) . ') VALUES (';
            // Evil complications detected... Sorry for the mess below.
            $a2 = str_repeat('? ', count($insertArray));
            $a1 = explode(' ', rtrim( $a2 ) );
            $bindings = implode(',', $a1);
            $sql .= $bindings . ')';
            $result = $this->db->Execute($sql, array_values($insertArray));
            if(!$result)
            {
                trigger_error($sql . '<br /><br />' . get_class($this) . ': ' . $this->db->ErrorMsg(), E_USER_ERROR);
            }
            if($do->getPrimaryKeyValue() === NULL) // set primary key if not explicitly set by user
            {
                $do->setPrimaryKeyValue( $this->db->Insert_ID() );
            }
            $do->setExists();
            $do->clearChangedFields();
        }
    }
 
    function getChangedFields(&$do) {
        $touched = array();
        $touched['fields'] = array();
        $touched['values'] = array();
        $touchedArray = $do->getChangedFields();
        $fields =& $do->getArray();
        foreach($fields as $key=>$val)
        {
            if(in_array($key, $touchedArray))
            {
                $touched['fields'][] = $key . ' = ?';
                $touched['values'][] = $val;
            }
        }
        return $touched;
    }
 
    // ...
 
?>
Hopefully you can forgive the gynastics in the CREATE section of the save() method....

By now you can probably follow the logic in the save() method. We check the $exists flag, and use it to determine whether to perform an UPDATE or a CREATE. UPDATES are organised to update only those DataObject values which have actually changed since the last update, or since the values were either created or fetched from the database. The new method getChangedFields() helps us compose the variable binding strings needed to take advantage of ADOdb Lite's auto quoting and escaping mechanism.

The generation of a CREATE query is more complex (either that or my mind drew a blank) so it fumbles about a bit in creating an SQL query with valid syntax. It also uses the variable binding formatting. Not to worry - it does work! Both create and update operations will clear the current $changedFields array in the relevant DataObject so we can be certain we do not end up saving the same data to the database twice - an error that could lead to duplication of rows or, if primary keys are used, a lot of database errors. Or even more annoying, attempting multiple updates when there is absolutely no need - a surprisingly common occurance which can impose a lot of extra wasteful database calls unless there's a safeguard in place.

Hint: DataObjects should be considered to inherit similar constraints as the table rows they represent. As an example, if a row field is given the property of NOT NULL, attempting to save a representative DataObject which was not given a value for this field will result in a database error. Remember to populate all NOT NULL fields before attempting a save() operation.

Finally to round off this lengthy in-depth view of the DataAccessObject class, here's the delete() method...


DataAccessObject: delete()

Code: Select all

<?php
 
    // ...
 
    function delete(&$do) {
        $sql = 'DELETE FROM ' . $do->getTableName() . ' WHERE ' . $do->getPrimaryKey() . ' = ?';
        $result = $this->db->Execute($sql, array( $do->getPrimaryKeyValue() ));
        if(!$result)
        {
            trigger_error($sql . '<br /><br />' . get_class($this) . ': ' . $this->db->ErrorMsg(), E_USER_ERROR);
        }
        $do->setExists(false); // deleted rows no longer exist <!-- s;) --><img src=\"{SMILIES_PATH}/icon_wink.gif\" alt=\";)\" title=\"Wink\" /><!-- s;) -->
    }
 
    // ...
 
?>

Onwards to Integration!


We now have covered the three major classes required to operate our simple-minded (relatively speaking) Data Access system. We have a DataObject which contains table specific information and the getters/setters we will use to manipulate row data. We have the DataObjectParent class which contains methods and properties common to all DataObjects - such as tracking changed fields, manipulating data, tracking row existence and offering import/export functions, etc. We also have a viable DataAccessObject which allows simple CRUD (Create, Read, Update and Delete) operations, while supporting WHERE conditions and automating any data escaping and quoting.

Our last step is pulling all three into a cohesive unit so we can actually perform tasks. Our current design involves the top level DataObjectParent class. This will include a reference to the Singleton instance of DataAccessObject (remember we constantly re-use this same object - without creating multiple copies).

The question becomes how to pass our DataObject into DataAccess methods, and how to let a DataObject directly call DataAccess methods? Remember, when we instantiate a $user DataObject, we call DataAccess methods such as $user->save() - each call does not require a user to provide the DataObject as a parameter. Sounds all backwards doesn't it?

We do this through delegation, or for reference, by implementing a Facade-like Pattern whereby DataAccessObject methods are called from any DataObject directly, and the user makes use instead of an alternate dummy API in DataObjectParent. Basically, we create methods in the DataObjectParent with the same name as those in the DataAccessObject and let the DataObjectParent worry about managing DataAccessObject calls.

Honestly - I think I confuse myself at times... The code will resolve any confusion.

Example:

Calling $user->save(), where $user is the DataObject, results in the DataObject calling:

$this->dao->save($this);

It simply delegates this function call to the DataAccessObject, passing a reference of itself as the DataObject parameter. The rest of the DataObjectParent class which allows this (and which all DataObjects will inherit) is:

Code: Select all

<?php
 
    // ...
 
    function getByPk($value=null) {
        if(isset($value))
        {
            $this->dao->getByPk($this, $value);
        }
        else
        {
            $this->dao->getByPk($this);
        }
    }
 
    function save() {
        $this->dao->save($this);
    }
 
    function delete() {
        $this->dao->delete($this);
    }
 
    function getRow() {
        $this->dao->getRow($this);
    }
 
    function getAll() {
        return $this->dao->getAll($this);
    }
 
    // ...
 
?>
As you can notice there are no return values (except for getAll() which returns an array) since the original object is passed as a parameter to the DataAccessObject methods which all inject their results back into the original object, or if an Object Collection, overwrite the object with an array of DataObjects. If you are familiar with simple variable Referencing in PHP, the end result is very similar.


Extending the DataObject: Convenience Methods


If you're following me so far, you may be thinking that utilising the current DataObject API for every single data read and write might end up being unnecessarily complex. One way of reducing complexity, particularly for common repetitive operations, is to extend the API with convenience methods. These methods can be created simply by mixing and matching our existing API with other logic, building them into single method calls.

Let's say for example that we know certain parts of our application gather a list of users based on their status from the database, updating each to an alternate status flag. Typically we would use:

Code: Select all

<?php
 
$user_collection = new User();
$user_collection->setStatus('invalid'); // used as a WHERE condition
$user_collection->getAll(); // SELECT * WHERE status = 'invalid'
foreach($user_collection as $key=>$user)
{
    $user->setStatus('valid');
    $user->save();
}
 
?>
If this was a commonly used code snippet we could simply build it into a convenience method which may be called as:

Code: Select all

<?php
 
$user = new User();
$user->updateStatusAll('valid', 'invalid'); // update all users to 'valid' if 'invalid'
 
?>
So where did this new method come from? What happened to our object collection? Is it updating all users? Consider this method appended to our normal DataObject class:

Code: Select all

<?php
 
    // ...
 
    function updateStatusAll($originState, $newState) {
        $this->setStatus($originState);
        $_collection = $this->dao->getAll($this); // delegate to DAO, passing self as reference
        foreach($_collection as $key=>$_user)
        {
            $_user->setStatus($newState);
            $_user->save();
        }
    }
 
    // ...
 
}
 
?>
This new addition is a convenience method. It encapsulates a piece of logic we commonly use in our application. Of course manually adding such extra methods directly to the DataObject makes it less amenable to Code Generation (see later section), but it makes sense to build up such extra methods to simplify even further the code we will eventually be using in our application's business logic. The more often a piece of logic is used, the more convenient it is to centralise it in a dedicated method. Such methods, by centralising code duplicated frequently in our application, are far easier to maintain and use in the long run.

A few alternatives designs exist besides appending these methods to the DataObject class. We could for example add these to a separate class and include through inheritance, i.e. DataObjectExtended inherits from DataObject which in turn inherits from DataObjectParent. Or you could re-use our previous encounter with delegation and the Facade Pattern to add this as a Singleton object being delegated to via Facade methods in our DataObjects, etc. Any such decision is yours to make.

For Code Generation fans, you could perhaps simply adapt the Builder class I introduce later to accept a list of such convenience methods (and their parameters) to be generated, where each delegates to a Singleton of such a standalone object. This will require a configuration file of some description, but is by no means difficult.



A Working Example: Delete all Invalid Users


This is just a quick example of a file using the entire system from database connection to final exit. As you can see, quite a bit can be centralised (e.g. initial database connections and ADOdb Lite instantiation). The rest does everything we need very simply, and without lots of SQL that could turn into a maintenance nightmare and leave those trying to extend our files confused until they learned SQL properly .

The MySQL specific table SQL without data population:

Code: Select all

CREATE TABLE `user` (
  `user_id` int(11) NOT NULL auto_increment,
  `name` varchar(64) NOT NULL default '',
  `status` varchar(10) default NULL,
  `email` varchar(64) NOT NULL default '',
  `www` varchar(128) default NULL,
  PRIMARY KEY  (`user_id`)
)

Code: Select all

<?php
 
// setup [url=http://adodblite.sourceforge.net]ADOdb Lite[/url] connection object
include 'ApplicationHelper.php'
include 'adodb_lite/adodb.inc.php';
$credentials = 'mysqli://username:password@localhost/tutorialdb'; // PEARDB-style DSN format using MySQLi library
$adodb = &ADONewConnection($credentials);
 
// use a Singleton object to store [url=http://adodblite.sourceforge.net]ADOdb Lite[/url] connection object - DataAccess calls getDA() in constructor. Could be any helper class.
$apphelper =& ApplicationHelper->getInstance();
$apphelper->setDA($adodb);
 
// include our DataAccess class files
include 'DataObjectParent.php';
include 'DataAccess.php';
include 'User.php'; // User DataObject class
 
// fetch our errant Users
$user_collection = new User();
$user_collection->setStatus('invalid');
$user_collection->getAll();
 
// delete all invalid users
foreach($user_collection as $key=>$user)
{
    $user->delete();
}
 
exit('All invalid users have been successfully deleted.');
 
?>
The code for the DataObjectParent, User DataObject and DataAccessObject classes may be downloaded from [not yet available]: Download. Windows users will need a 3rd party compression utility like WinZip to extract the archive. Other code used can be changed as you wish.


Code Generation: Never Hand Type DataObjects!


Never is probably a bit strong, but to a large extent I find manually typing dozens of DataObjects and maintaining them as my database structure alters or expands to be one of the most tedious tasks possible in PHP. For starters DataObjects usually follow a very specific and predictable format with few, if any, deviations. One solution to avoiding such tedium is to write a class capable of generating all those DataObjects using information about each database table you use.

The approach I'll demonstrate amounts to a quick hack - the class is highly specific to this tutorial's DataObject class and requires MySQL so take it as an example only. This section is more to raise awareness that generating code is worth doing where appropriate.

Our DataObject Generator is an active code generator. This means it can be run at any time you make changes to database table structure, field naming, etc. In fact it runs entirely off meta information provided on a table's columns from the database itself, i.e. it requires no configuration file. The opposite is a passive generator, which creates code which must be edited or maintained in the future without additional generation runs. Avoid the second if possible - I wasn't joking when I said it was tedious.

Using MySQL, here is our rapidly coded class:

Code: Select all

<?php
 
// set new line for local operating system if not available
if (!defined('PHP_EOL'))
{
    switch (strtoupper(substr(PHP_OS, 0, 3)))
    {
        case 'WIN':
            define('PHP_EOL', "\r\n"); // Windows
            break;
        case 'DAR':
            define('PHP_EOL', "\r"); // Mac OS
            break;
        default:
            define('PHP_EOL', "\n"); // *nix
    }
}
 
class Builder_DataObject {
 
    var $db;
    var $lastField;
    var $tableName;
    var $className;
    var $primaryKey;
 
    function Builder_DataObject(&$db, $tableName, $className=null) {
        $this->db = $db; // [url=http://adodblite.sourceforge.net]ADOdb Lite[/url] connection object
        $this->tableName = $tableName;
        $this->className = $className;
    }
 
    function GenerateClass($savepath) {
        // this is MySQL specific sql
        $fields = $this->db->GetAll('SHOW COLUMNS FROM ' . $this->tableName);
        $fields_start = '';
        $vars = array();
        $methods = '';
        $fields_end = '';
 
        foreach($fields as $array2)
        {
            // we only accept one primary key in our DataObjects (for this tutorial only mind!)
            //$primarykey = false;
 
            // iterate through all fields this table stores
            foreach ($array2 as $key=>$val)
            {
                // our generator only requires field and primary key names
                if($key !== 'Field' && $key !== 'Key')
                {
                    continue;
                }
                if($key == 'Key')
                {
                    if($val == 'PRI')
                    {
                        $this->primaryKey = $this->lastField;
                    }
                    continue;
                }
                $this->lastField = $val;
                // This section splits field names into terms, and capitalises first
                // letter of each to create our method names
                $method_array = explode('_', $val);
                $method_array_filtered = array_map('ucfirst', $method_array);
                $methodName = implode('', $method_array_filtered);
                // Append new method strings
                $methods .= "\tfunction get".$methodName.'() { return $this->data[\''.$val."']; }".PHP_EOL;
                $methods .= "\tfunction set".$methodName.'($var) { $this->data[\''.$val."'] = \$var;  \$this->changedFields[] = '" . $val . "'; }".PHP_EOL.PHP_EOL;
            }
        }
 
        // Create the start of the class (using readable formatting)
        $file = '';
        $file .= "<?php".PHP_EOL.PHP_EOL;
        $file .= 'class '.$this->className." extends DataObjectParent {".PHP_EOL.PHP_EOL;
        $file .= "\tfunction " . $this->className . '($fields' . "=null) {".PHP_EOL;
        $file .= "\t\tparent::DataObjectParent(".'$fields'.");".PHP_EOL;
        $file .= "\t\t\$this->tableName = '" . $this->tableName . "';".PHP_EOL;
        $file .= "\t\t\$this->className = '" . $this->className . "';".PHP_EOL;
        $file .= "\t\t\$this->primaryKey = '" . $this->primaryKey . "';".PHP_EOL;
        $file .= "\t}".PHP_EOL.PHP_EOL;
 
        // Add in our methods
        $file .= $methods;
 
        // Close the class
        $file .= "}".PHP_EOL.PHP_EOL;
        $file .= '?>';
 
        // Write out this file to a $savepath directory
        clearstatcache();
        $fs = @fopen("$savepath/" . $this->className . '.php', 'w+t');
        @fwrite($fs, $file);
        @fclose($fs);
    }
 
}
 
?>
All we are doing is gathering information for a table - capturing its field names and primary key. Using this data we then construct the class according to the predictable format (remember our naming convention?) we have put in place. Once constructed, we simply write out the file where it sits waiting for us to use.

As you can see this is hardly the most complex class ever written. Using this Builder, as I call it, you can generate all the DataObjects needed for your database simply by calling it's primary method for each table you have added. You can conceivably even skip specifying the table, and just grab a list of all tables from the database using the "SHOW TABLES" query or similar. Now doesn't this beat hand coding each DataObject manually?

Usage Example:

Code: Select all

<?php
 
// assume $db is already created as an [url=http://adodblite.sourceforge.net]ADOdb Lite[/url] connection object
 
$tableName = 'user';
 
require 'Builders/Builder_DataObject.php';
$builder = new Builder_DataObject($db, 'user', 'User');
$builder->GenerateClass('Generated/DataObjects');
 
?>
This might also demonstrate a key difference between this tutorial's approach to SQL abstraction, and the approach used by a number of other commonly used systems such as the ActiveRecord implementation of Ruby's Ruby-On-Rails, or several RoR inspired ActiveRecord implementations for PHP. These implementations often rely on an element of constant code generation and meta programming. Rather than generate DataObjects once only, these often generate them on the fly for every new ActiveRecord instance. This also involves extra database queries to gather the meta information required for such code generating. Both approaches are valid, just keep the differences in mind, and the inevitable performance cost in the second approach.


Conclusion


This tutorial's objective was to introduce the concepts of the DataObject (or TransferObject), DataAccessObject, and to demonstrate building a small light system around these capable of performing common row operations without the need to write SQL. It also demonstrates how using Code Generation effectively removes from this system the need for a lot of manual class typing.

As with many such tutorials it is not without limitations. We have ignored for its duration several possible additions such as conditional multi-row updates, the use of foreign keys, limiting result data (all our queries use the infamous asterisk instead), etc. As you can guess, depending on your application's requirements and more specifically its database structure and need for advanced database features (all those things MySQL was missing until recently for instance!), your own DataAccess/DataObject or Object Relational Mapping system will need further development. Plan for the future, not the present, if possible.

One truly common pitfall of sorts is the lack of conditional multi-row updates. Many DataObject and ActiveRecord implementations (this tutorial included) seem to suggest that updating multiple rows may be performed using:

Code: Select all

<?php
 
// switch all invalid users to valid status
$user_collection = new User();
$user_collection->setStatus('invalid');
$user_collection->getAll(); //fetch all invalid users
 
// update all users of invalid status to valid
foreach($user_collection as $key=>$user)
{
    $user->setStatus('valid');
    $user->save();
}
 
?>
This is inherently sub-optimal since a simple UPDATE...WHERE query would do it a lot faster. This might be worth improving in any implementation - maybe an optional save() parameter for creating the WHILE condition for multi-row updates. It does however demonstrate that in many circumstances when using DataObjects, you will be trading between simplicity and performance to some degree. Bear that trade-off in mind when designing an ORM solution for your applications.

Many developers rather than opting for a roll-your-own ORM implementation may feel more secure using a more widely supported and feature heavy alternative such as the excellent Propel, or the popular PEAR::DB_DataObject libraries among many others. We don't all develop small or non-complex applications afterall. There is also the upcoming Zend Framework implementation of ActiveRecord to consider.

Final note, all code herein is licensed under the GNU Lesser General Public License. Feel free to borrow, steal and mutate any pieces you wish. Any questions relating to this tutorial may be posted here on the Devnetwork forums.

Copyright (c) 2006 Pádraic Brady.
Last edited by RobertGonzalez on Fri May 23, 2008 12:24 pm, edited 1 time in total.
Reason: Just doing a brief resave so that the new bbCode GeSHi plugin parses the php tags
mikethompsonuk82
Forum Newbie
Posts: 1
Joined: Thu Dec 08, 2011 10:05 am

Re: PHP DataObjects: Simplifying Business Logic without SQL

Post by mikethompsonuk82 »

Wow, now that was a lot to get through, but glad I took the time. Really appreciate the bathmate guide Maurgrim, top effort.
Post Reply