Page 1 of 4

Ideas for lightweight ORM implementation

Posted: Tue Jun 02, 2009 8:37 am
by allspiritseve
Arborint and I are going to be working on a lightweight ORM in the same vein as our Pagination classes. Basically, we want to write something that will fill the void between a Table Data Gateway/Active Record implementation and heavy duty ORMs like Doctrine or Propel. Ideally it would be a layered solution, possibly even built over Skeleton's existing TDG or AR classes. We are using Fowler's ORM patterns as a starting point, but may go in a couple of different directions depending on where the code takes us.

I was wondering if anyone had any suggestions as to what they would like to see in a lightweight ORM. Any input would be very much appreciated.

Re: Ideas for lightweight ORM implementation

Posted: Tue Jun 02, 2009 12:49 pm
by alex.barylski
So we can all better understand, why not list the classes that will be used and maybe some basic interactions?

Re: Ideas for lightweight ORM implementation

Posted: Tue Jun 02, 2009 2:02 pm
by allspiritseve
Well, I just wanted to open it up to see if I can get ideas on what people look for in an ORM. But here's some code from Skeleton's existing DataMapper implementation:
arborint wrote:Here is use case from examples/db/datamapper. The idea is that a Mapper is created for each class to be mapped to a database. Each individual class/property to table/column mapping is defined with a Mapping. As mentioned we should be able to map to a class/method (setter), and also probably map to a database/table/column.

Code: Select all

class User_Mapper extends A_Db_Datamapper {
 
    public function __construct($db) {
        $this->setDb($db);
        $this->setClass('User');
        $this->setTable('users');
        $this->addMapping(new A_Db_Datamapper_Mapping('username', 'userid', 'string', 20, true, '', array()));
        $this->addMapping(new A_Db_Datamapper_Mapping('password', 'passwd', 'string', 24, false, '', array()));
        $this->addMapping(new A_Db_Datamapper_Mapping('active', 'inactive', 'string', 1, false, '', array()));
        // uncomment these two lines and comment previous line to show join generation
        $this->addMapping(new A_Db_Datamapper_Mapping('dept', 'dept_field', 'string', 1, false, 'company', array()));
        $this->addJoin(new A_Db_Datamapper_Join('users', 'userid', 'company', 'users_id', 'LEFT'));
    }
}
 
class User {
    public $username = '';
    public $password = '';
    public $active = false;
    public $dept = '';
    
    public function __construct($username='', $password='', $active='', $dept='') {
        $this->username = $username;
        $this->password = $password;
        $this->active = $active;
        $this->dept = $dept;
    }
}
 
// there are several ways to configure the mapper
#$Mapper = new A_Db_Datamapper(new Mock_Db(), 'User', 'users'); // need to add mappings like in User_Mapper
$Mapper = new A_Db_Datamapper_Xml(new Mock_Db(), 'mapping01.xml');
#$Mapper = new User_Mapper(new Mock_Db());
#$Mapper->allowKeyChanges(false);       // allow the key to be changed in loaded properties
 
// load() fetches a database record by the key
$User1 = $Mapper->load('Steve');
$User2 = $Mapper->load('Sally');
$User3 = $Mapper->load('Sam');
 
// calling load90 with an already loaded key will return the object already in memory
$User4 = $Mapper->load('Steve');
 
// objects can then be used normally
$User1->username = 'adsf';
$User2->password = 'xxxxx';
$User2->active = 'Y';
$User3->active = 'N';
$User4->active = 'Y';
 
// new objects can be added that will be inserted later
$User5 = $Mapper->add(new User('Stephanie', 'kaboom', 'Y', 'South'), false);
 
unset($User3);
// commit will generate SQL and then call db object if present
$Mapper->commit();
My idea was that you could get metadata from various sources to configure the mappings. There is also a A_Db_Schema class that gets EXPLAIN data from a database table that was intended to be used as a way to auto-configure the mappings (reflection would be used for the object).

As a demonstration I added the A_Db_Datamapper_Xml class to show getting metadata. The files (for the above example) look like this:

Code: Select all

<?xml version="1.0" encoding="utf-8" ?>
<map>
    <class>User</class>
 
    <table>users</table>
 
    <mapping>
        <property>username</property>
        <field>userid</field>
        <type>string</type>
        <size>20</size>
        <is_key>1</is_key>
        <table></table>
        <filters></filters>
    </mapping>
 
    <mapping>
        <property>password</property>
        <field>password</field>
        <type>string</type>
        <size>24</size>
        <is_key></is_key>
        <table></table>
        <filters></filters>
    </mapping>
 
    <mapping>
        <property>active</property>
        <field>inactive</field>
        <type>string</type>
        <size>20</size>
        <is_key></is_key>
        <table>company</table>
        <filters></filters>
    </mapping>
 
    <join>
        <table1>users</table1>
        <field1>userid</field1>
        <table2>company</table2>
        <field2>users_id</field2>
        <join_type>LEFT</join_type>
    </join>
</map>

Re: Ideas for lightweight ORM implementation

Posted: Tue Jun 02, 2009 4:37 pm
by Christopher
PCSpectra wrote:So we can all better understand, why not list the classes that will be used and maybe some basic interactions?
While I think the interaction are important, they tend to be very similar in the general design, and have small differences in some details (e.g. dealing with lists as an array or through an interface). So the ideas usually look something like this:

Data Mapper

Code: Select all

$UserMapper = new UserDataMapper();     // extends DataMapper class and initialized internally
$User = $UserMapper->load('Steve');
// objects can then be used normally
$User->username = 'adsf';
$User->password = 'xxxxx';
$User->active = 'Y';
// mapper tracks objects and can persist them
$UserMapper->commit();     // all new or changed objects tracked by the mapper saved
Active Record

Code: Select all

$User = new User('Steve');     // extends ActiveRecord class and initialized internally
// objects can then be used normally
$User->username = 'adsf';
$User->password = 'xxxxx';
$User->active = 'Y';
$User->save();     // object contains persistence code
I guess part of the discussion that allspiritseve is starting here is about what people have found useful in practice. But the other side to the discussion is that the whole point of these systems is that there is a mountain of code behind a very simple interface that allows that simple interface. Conventions are used to allow a simple interface to the various features of the system.

One question I put to allspiritseve is whether separating the database schema from the class design is the main thing that people need -- because it is core feature of O/RMs. Or whether making it easier to create a rich Domain Model is a better goal. Do you want O/RM for everything or only where mapping is needed? For example, a system that allows to to really easily work with a database table, using the exact schema of the database table, might be more useful more often, and have much less overhead. If that system and the system backed with an O/RM had the same interface then you could choose mapping only when needed. Which starts to back out the the real goal of making coding and changes faster, easier, more expressive, etc.

Re: Ideas for lightweight ORM implementation

Posted: Tue Jun 02, 2009 7:35 pm
by wei
Simple crud operations as relatively easy to implement what is hard is to be able to allow for arbitrary sql, often required in generating reports from the db. Often these report queries can span a few pages of sql. What is timing consuming, for me at least, it not writing the sql but in mapping the results to objects that often have nested object structures.

For the data mapper example, why does the User class have a save method? Shouldn't it be $UserMapper->save($user) ?

Re: Ideas for lightweight ORM implementation

Posted: Tue Jun 02, 2009 7:58 pm
by Christopher
wei wrote:Simple crud operations as relatively easy to implement what is hard is to be able to allow for arbitrary sql, often required in generating reports from the db. Often these report queries can span a few pages of sql. What is timing consuming, for me at least, it not writing the sql but in mapping the results to objects that often have nested object structures.
That is a big part of the conversation that we are having. It is funny because many of the things that people call ORM have no real mapping at all (call it hard coded 1:1 ;)). Many of them are RoR influenced OO query builder which is a direction that suffers from the Law of Diminishing Returns.

I think you make a great point. Nothing matches the expressiveness of SQL. It should be one of the front ends to any system. And I agree that it is managing, tracking and persisting the nested object structures that result. I am starting to think that this system should be divided in to front-end query systems, middleware that manages organization, persistence, locks and idenity, and back-end systems that deal with the datasource (mapping and non-mapping).
wei wrote:For the data mapper example, why does the User class have a save method? Shouldn't it be $UserMapper->save($user) ?
Thanks! Well spotted. That was my copy/paste error. I fixed the code above.

Re: Ideas for lightweight ORM implementation

Posted: Wed Jun 03, 2009 9:50 am
by temidayo
arborint wrote:
wei wrote:Simple crud operations as relatively easy to implement what is hard is to be able to allow for arbitrary sql, often required in generating reports from the db. Often these report queries can span a few pages of sql. What is timing consuming, for me at least, it not writing the sql but in mapping the results to objects that often have nested object structures.
That is a big part of the conversation that we are having. It is funny because many of the things that people call ORM have no real mapping at all (call it hard coded 1:1 ;)). Many of them are RoR influenced OO query builder which is a direction that suffers from the Law of Diminishing Returns.

I think you make a great point. Nothing matches the expressiveness of SQL.

If the ORM maps an object to a database table, how do you intend to support TRANSACTION if multiple tables need to be
accessed with 'atomic' queries?

Re: Ideas for lightweight ORM implementation

Posted: Wed Jun 03, 2009 9:57 am
by inghamn
What I look for in an ORM. Whenever I'm working on an ORM layer, I'm asking myself if it satisfies these goes.

Goals: Things an ORM should aspire to...or facilitate

Quick to develop with
Getting up and going with an ORM should be fast.

Rich Domain Model
This is the what we're doing when we write an application using an ORM. Any ORM needs to make it easy for a developer to create a domain-centered API. The domain API is subservient to the user interface. If you're like me, what the API will be is usually unknown while you're developing. Instead, the domain API is going to grow over time in response to user interface requests and feature requests.

For my stuff, the order goes Model -> Controller -> View. Each step does things in a way as to make it simpler to write code in the layer above. For my part, I have been combining the database code and business logic all into a single ActiveRecord-style model. My latest application, however, has left me right on the edge of maybe wanting to split out the database layer from the business logic.



No hidden code
This comes from ORMs where you extend base classes after base class. Pretty soon, inheritances run so deep, you don't really know where some functionality is coming from, nor how to change it. At least, not without some hard core documentation. And no one likes to write that documentation, nor really wants to read it.

DI can address this, but also starts running into complexity issues pretty quickly. I think, rather, an ORM should try to be a simple, bare-bones architecture, facilitating developers to do the work. Which leads to...

As close to SQL as possible
If you only need to support one database server, you can probably get away with using raw SQL everywhere in the ORM. Once you need to support mutiple database servers with the same code, you really have to use some sort of query bulding language. The different databases have enough variations in SQL that you will always be bumping into needing different raw SQL generated.





In order to meet those goals,
an ORM should provide:

Basic CRUD
Once you sure of the soundness and security of your database interaction, you can start building the domain model starting from the basic crud functions.

Examples of code using th ORM
The ORM is not the final goal. It is only a starting point, and should point the developer down a good path for development. The ORM shouldn't enforce this, it up to the developer to follow through. There will always be a few cases where you'll need to deviate from some idealized style.

Re: Ideas for lightweight ORM implementation

Posted: Wed Jun 03, 2009 12:10 pm
by Christopher
inghamn wrote:Quick to develop with
Getting up and going with an ORM should be fast.
Agreed, but not always that easy. ;) That is a major goal. What do you consider some of the qualities that make a system easy to learn?
inghamn wrote:Rich Domain Model
This is the what we're doing when we write an application using an ORM. Any ORM needs to make it easy for a developer to create a domain-centered API. The domain API is subservient to the user interface. If you're like me, what the API will be is usually unknown while you're developing. Instead, the domain API is going to grow over time in response to user interface requests and feature requests.

For my stuff, the order goes Model -> Controller -> View. Each step does things in a way as to make it simpler to write code in the layer above. For my part, I have been combining the database code and business logic all into a single ActiveRecord-style model. My latest application, however, has left me right on the edge of maybe wanting to split out the database layer from the business logic.
One of the big questions we have is whether we are building a traditional O/RM or whether we are building a system for creating Rich Domain Models (exactly the term we used ;)) where some of the individual object in the Domain Model may need mapping to decouple the dependency between class structure and database schema.
inghamn wrote:No hidden code
This comes from ORMs where you extend base classes after base class. Pretty soon, inheritances run so deep, you don't really know where some functionality is coming from, nor how to change it. At least, not without some hard core documentation. And no one likes to write that documentation, nor really wants to read it.

DI can address this, but also starts running into complexity issues pretty quickly. I think, rather, an ORM should try to be a simple, bare-bones architecture, facilitating developers to do the work. Which leads to...
We started with the goal that object from the ORM should not have to inherit a base class, or even be passed back to the ORM to be persisted. It should track them for you.
inghamn wrote:As close to SQL as possible
If you only need to support one database server, you can probably get away with using raw SQL everywhere in the ORM. Once you need to support mutiple database servers with the same code, you really have to use some sort of query bulding language. The different databases have enough variations in SQL that you will always be bumping into needing different raw SQL generated.
We have talked about supporting scaffolding generated class, classes that use custom coded mappers (containing SQL) and classes that are configured with data (whether manually or auto-discovered). I don't know if there is one best solution.
inghamn wrote:In order to meet those goals,
an ORM should provide:

Basic CRUD
Once you sure of the soundness and security of your database interaction, you can start building the domain model starting from the basic crud functions.
Interesting.
inghamn wrote:Examples of code using th ORM
The ORM is not the final goal. It is only a starting point, and should point the developer down a good path for development. The ORM shouldn't enforce this, it up to the developer to follow through. There will always be a few cases where you'll need to deviate from some idealized style.
I think we may end up supporting a bunch of different combination and styles, so you make a good point that we should present what we consider a best practice approach.

How does what you use compare to what I posted above?

Re: Ideas for lightweight ORM implementation

Posted: Wed Jun 03, 2009 7:45 pm
by wei
// mapper tracks objects and can persist them
This is very hard to do and there are so many edge cases. For example, just think about related objects, sometimes one object needs to be persisted before its parent to obtain data from the insert/update (e.g. last insert id), sometime it will be the other way around, persist parent first, then child. Just let the developer do it them selves, e.g. $mapper->save($user); Then there are transactions across multiple connections...

Basic CRUD operations should employ SQL generation, but nothing above simple 1 table interactions. Even at this level, things like select with limit and offset will create headaches in generation of SQL code, e.g. for MSSQL there are no LIMIT OFFSET clauses, there are work arounds that require rewriting the sql on the fly, thus limited to nice queries...

For building the object graph, just consider building the graph from list of array values and some mapping definitions. This means that the mappings are based on the results returned by the db and not on some SQL code. This is an important distinction that allows for arbitrary queries, including legacy dbs, store procs, and even distributed hash-tables. Thus, the design is really a data mapper not just and SQL data mapper.

WEi.

Re: Ideas for lightweight ORM implementation

Posted: Wed Jun 03, 2009 8:57 pm
by Christopher
wei wrote:This is very hard to do and there are so many edge cases. For example, just think about related objects, sometimes one object needs to be persisted before its parent to obtain data from the insert/update (e.g. last insert id), sometime it will be the other way around, persist parent first, then child. Just let the developer do it them selves, e.g. $mapper->save($user); Then there are transactions across multiple connections...
You are probably right, but for a lot of cases having the mapper track things is really nice. Probably the solution is to support both.
wei wrote:Basic CRUD operations should employ SQL generation, but nothing above simple 1 table interactions. Even at this level, things like select with limit and offset will create headaches in generation of SQL code, e.g. for MSSQL there are no LIMIT OFFSET clauses, there are work arounds that require rewriting the sql on the fly, thus limited to nice queries...
Limiting query results is handled by the individual database adapters. I think CRUD should be able to handle simple JOINs, but you may be right that it is too complicated. Since we plan to map individual properties or setters to table/columns I think we should be able to handle JOINs.
wei wrote:For building the object graph, just consider building the graph from list of array values and some mapping definitions. This means that the mappings are based on the results returned by the db and not on some SQL code. This is an important distinction that allows for arbitrary queries, including legacy dbs, store procs, and even distributed hash-tables. Thus, the design is really a data mapper not just and SQL data mapper.
Agreed, the object graph need to be on the actual data and not SQL.

Re: Ideas for lightweight ORM implementation

Posted: Thu Jun 04, 2009 8:23 am
by inghamn
I apologize for my slow posting. I'm on vacation, and haven't really set aside any time for PHP in between the beach, and the sailboats, and sleeping.
arborint wrote: One of the big questions we have is whether we are building a traditional O/RM or whether we are building a system for creating Rich Domain Models (exactly the term we used ) where some of the individual object in the Domain Model may need mapping to decouple the dependency between class structure and database schema.
I believe you should be doing the latter. Traditional ORM attempts to support too much. It tries to support every possible thing you'd want to do with the database, and to map it for you.

I believe you want a system (maybe not call it an ORM anymore) that facilitates you (the developer) to rapidly grow the Rich Domain model. The domain model is the goal. (At least inasmuch as it supports the goal of quickly writing an application that satisies your users.

arborint wrote: How does what you use compare to what I posted above?
I scanned above and I am guessing you're referring to the Data Mapper stuff? I'm finding it hard to find some common ground in our implementations.

I do not use any mappers. My framework does not provide any mapper functions. The basic relationships usually get stubbed out from the generators, but it's up to the developer to write functions to return whatever objects are desired based on the goal of simplifying the controller and view code.

My models are Active Record style, with both DB and business logic in the same class file. They don't extend anything and are responsible for all their own database interaction, although they can and should call other classes for data, instead of looking it up themselves. The starting point is one class for each table with a single primary key, and the basic, starting code is stubbed out by code generators.


Consider a simplified example of a Book having an Author, I've left out the functions for loading and saving the Book, so I could show how to do relationships. (Loading would be done in the constructor, and there'd be a save function)

Code: Select all

 
class Book
{
    private $id;
    private $author_id;
    
    private $author;
    
    public function getId()
    {
        return $this->id;
    }
    
    public function getAuthor_id()
    {
        return $this->author_id;
    }
    
    public function getAuthor()
    {
        if (!$this->author) {
            $this->author = new Author($this->author_id);
        }
        return $this->author;
    }
}
 
So, the needed relationships are expressed in code, and written over time, as the application is developed. That really hairy many-to-many relationship may never be needed to be expressed, given the user interface desires. Why build it until you need it.


I expect these objects to only return data from their own table, and to return the objects of other tables when that data might be needed. These objects are what's passed to the views, and the views will get whatever data they need from them.

Re: Ideas for lightweight ORM implementation

Posted: Thu Jun 04, 2009 9:19 am
by wei
data mapper usually mean that the domain model do not need to know that they can be persisted. A good problem to solve is how to transfer the data between the domain model and the persistent storage when there are mismatches in the two representations, i.e. domain models with object graphs and storage is usually relational.

Solving the problem where the object model and the relational model are fairly alike have been shown in many implementations, e.g. most active records implementations. Even in active record implementations, the difficulties are not in save/update/delete, the magic usually happens in the finder methods, e.g. when the impedance mismatch between object models and relational models is near its peak.

data mapper seems to be promising in that it may acts as the bridge between object model and relational model differences.

Re: Ideas for lightweight ORM implementation

Posted: Thu Jun 04, 2009 9:37 am
by allspiritseve
inghamn wrote:I believe you want a system (maybe not call it an ORM anymore) that facilitates you (the developer) to rapidly grow the Rich Domain model. The domain model is the goal. (At least inasmuch as it supports the goal of quickly writing an application that satisies your users.
Could you give some specific examples of things a system like this would do?

Re: Ideas for lightweight ORM implementation

Posted: Fri Jun 05, 2009 8:00 am
by inghamn
wei wrote: Solving the problem where the object model and the relational model are fairly alike have been shown in many implementations, e.g. most active records implementations. Even in active record implementations, the difficulties are not in save/update/delete, the magic usually happens in the finder methods, e.g. when the impedance mismatch between object models and relational models is near its peak
You're right, the real thinking needs to be applied to the aggregate stuff. My reasoning is that, if you're doing Active Record, then that's the one and only place for persisting (save) to happen. Active Record + Data Mapper is just dupicating code.

I've been going the route of Active Record + Collections. The collection classes are where we can put all the SQL needed to handle returning a bunch of Active Record objects. With a flexible find function, you can do just about anything you want. And if there's some reporting you just can't figure out how to do with the Active Record object or the Collection's find function, you can admit defeat, and just write your own static function into the collection class to do whatever.
allspiritseve wrote: Could you give some specific examples of things a system like this would do?
Ideally, I'd want the ORM to handle the complicated stuff, and leave the easy stuff to the developer.

Feature bullet points I think I'm looking for:
Database vendor SQL translation
It should know how to write SQL to work with all major databases, not just myql. Supporting multiple database servers is probably the biggest and only reason I'm currently shopping around for an ORM.

Date translations
Each database returns date fields in a different string format. It would be nice to have the ORM layer handle date translations. I'd like to be able to set how I want dates returned and stored from the database.

In my stuff, I prefer to handle all dates as timestamps in PHP. It's a pain to keep converting (both on read and update).

Auto-Increment
Not all database do auto-increment like mysql. Oracle and Postgres use sequences. The ORM should handle whatever steps are necessary to do auto-incrementing in any database. It's a useful enough feature, and I wish all databases supported it.


Well, that's all I can think of right now. I've actually left relations off the list. I'm not sure I want to get bogged down in some custom mapping system from an ORM. The relationships fall into the easy stuff category, and best left to the developer.

Maybe that's why I'm having a hard time finding a workable, existing ORM. I keep feeling like they're trying to do too much.