Page 1 of 1
Model to Database design patterns...
Posted: Tue Apr 25, 2006 10:57 am
by nielsene
I've been re-architecting a large system; cleaning up prototype code that snuck into a production system, etc. The plan has been to do a complete code rewrite, and I've been working on most of the internal plumbing so far. The application is a Rich Domain Model, Template View, Front Controller->Application Controllor style implementation of MVC and those aspects are working well.
I've yet to find a system I like for connecting the underlying objects to the DB. The old system was basicaly a Active Record-Data Mapper hybrid style system, where the classes all inherited from a "StoreableObject" type that handled creation of CRUD queries from registered field->table,field entries. (Ie the constructor of each subcalss contained the mapping information needed for the base class to construct the queries needed.) My database is highly normalized (5/6th-normal form (6th normal is used in temporal/history tracking applications)) and my objects have complex relationships to the underlying tables. I make extensive use of the full feautures of my database (PostGreSQL) so using many of the existing ORM-style frameworks is not possible due to their lowest common denomiator or MySQL only focus.
I've been wanting to seperate the Model from the DB, so I've been exploring some of the other patterns, most notably a true non-hybrid Data Mapper. However, I don't want to parse out a huge ORM document on every page load, as there are hundreds of classes invovled and most won't be needed on any given page (whole Java versus PHP one-time versus every-time issue). This issue is basically why I merged the mapper into the Active Record in the first place. I've been pushing most of the "find" type functionality into a DatabaseGatewway to create the single point of access to the DB. I desprately want to add some sort of lazy load to these system as well. Non-class based queries (mainly statistic/reporting tyoe queries) are bundled into a PhraseBook.
To top matters off, I'm following TDD and (attempting) Agile Data Modelling, so my database model IS changing rather rapidly so I definitely can't afford to have database model information scatter too much in the model.
I'm currently leaning towards just reverting back to the hybrid Data Mapper/Active Record, adding a test harness to it, and refactoring it to meet my new standards, etc. It does mean that almost all of the Domain will have a depedency to the DB, though.
Any thoughts?
Posted: Tue Apr 25, 2006 11:17 am
by Christopher
The direction that many systems are going these days is convention over configuration. That helps reduce the amount of "ORM information" needed and distributes it. Convention may also help with Lazy Load as well.
Also, going with ActiveRecord may be the source of some of your problems because it may not be the best way to handle complexity. It is a little hard to know what to do without seeing how the various classes work together.
Posted: Tue Apr 25, 2006 11:37 am
by nielsene
Its a little hard to explain (isn't it always)
The old application's "main" ORM classes were Couple, Person (and its Competitor subclass), Address, and Organization.
Person contained an Address and Organization instance via composition. Couple contained a leader and a follower (both Competitors) via composition and a simple list of eventids.
The new system is bringing more of the application into a strong domain model, so things like Competition and Events are expected in the very near future. Competitions will contain multiple People (organizers/contact people), Dates (registration deadlines, event date, etc) as well as Events. Events contain identification information plus list of couples.
There are a ton of 1:1 relationships in the Database as one way of avoiding nulls (I'm a Date/Pascal-style DB designer) and almost all the classes need some form of 1:m at times as well.
In almost all cases composed objects should be lazy loaded, so at least its a convientent place to add that functionality.
I agree that an Active Record isn't the best choice for managing complexity, but I'm not sure that my hybrid suffers from all the limitations. The descendents of Storeable Object expose: save() (determines if update or insert is needed), loadByKey(), and delete() at present, along with any domain logic. It looks like I want to move loadByKey to the DatabaseGateway. I've never liked having to Load an object to delete it, so that's doing to have to move as well. Thus all that's left is save. Now obviously that's moving the mapping information to two places, so it might as well be pulled out as well.
I was aso think about some "polymorphic" functions in the DatabaseGateway, save($foo) for instance that used a get_class call to find the name of the class to load the mapping for before saving. With the mappings either stored in another Phrasebook (but that's back to loading all the configurations to use one), or one per file in a mappings directory. It would need a little bit of logic to trace up the class hierarchy in case the subcalss didn't change/extend the db mappings, but its doable. Search and delete would either need the typical findBarByID($foo) explosion of methods in the Gateway or findByID($className,$foo), neither of which are particularly attractive in my opinion.
Posted: Tue Apr 25, 2006 11:48 am
by Christopher
I think I get a little better idea of the problem, but would need to see some code to be of any help. Have you looked at things like Propel?
Posted: Tue Apr 25, 2006 12:43 pm
by nielsene
I had looked at a long list of ORMs for PHP4 (still haven't been able to migrate to PHP5). I just looked at Propel, aside from the PHP5 requirement I haven't seen anything that looks too deal breaker for me... I might bite the bullet and see about setting up a PHP5 development system .... (I've taken a week of vacation from paying work to try to get a jump start on this system re-architecture, so if I have to do some more infrastructure type upgrades, now is the time....)
I just updated the on-line developer documentation to link source code (using phpDocumentor). That's now a mix of production code and an abandoned attempt to retrofit a new design over the existing design, as opposed to the rewrite I've started. Only a small fraction of the files are properly tagged with phpDoc tags, so its not the best docs...
This code is among the oldest in the project and the most warty/least experienced author (me just starting with php) syndrome.
However here is the link to the documentation home page:
http://compinabox.sf.net/official-docs/
And here are links to what I think might be the three most relevant files for review:
StoredObject base class Why is this link not rendering???
Person Subclass of above
One of the "cleaner" (cough, cough) scripts that use the abbove: prime-check.php
Posted: Tue Apr 25, 2006 10:12 pm
by Christopher
Eric, I know I asked for code

, but that's a little more dense code than I have time to understand right now. Can you give me some examples?
Posted: Tue Apr 25, 2006 10:53 pm
by nielsene
arborint wrote:Eric, I know I asked for code

, but that's a little more dense code than I have time to understand right now. Can you give me some examples?
What kind of examples woud help? Examples of usage of the classes, examples of the current way I configure the ORM, etc?
Posted: Tue Apr 25, 2006 10:55 pm
by Christopher
Probably more usage and a little of the config.
Posted: Wed Apr 26, 2006 1:52 pm
by nielsene
OK here's two sample configurations:
Code: Select all
function Couple($db=0, $id=0) {
$this->primeField="id";
$this->setID($id);
StoredObject::StoredObject($db);
$this->addTable("couples",
"coupleid","id",
array("leader","follower"),
array("leader","follower"));
$this->addTable("events_registration",
"coupleid","id",
array("eventid","competitorNumber"),
array("eventid", "number"));
}
This one is rather simple. The first addTable (the main configuration instruction) sets up that the "primeField" ID maps to the primary key "coupleid" of the "couples" table. It also maps attributes "leader" and "follower" to identically named columns of that same table.
The second addTable links in a second table along a foriegn key link. The ORM code behind this will figure out that that its a one-many relationships, "eventid" and "number" will be stored in parrallel array member variables.
A slightly more involved example (because which tables are used depends on configuration data -- same software hosts multiple events with different information requirements for registration.)
Code: Select all
function Person($db=0, $id=0) {
$this->primeField="id";
$this->setID($id);
StoredObject::StoredObject($db);
GLOBAL $SD_elements;
if (in_array("Name",$SD_elements))
$this->addTable("people",
"peopleid","id",
array("firstname","lastname"),
array("firstname","lastname"));
if (in_array("Affiliation",$SD_elements)) {
$this->addTable("people",
"peopleid","id",
array("organization"),
array("organizationID"));
$this->org = new Organization($this->db);
}
if (in_array("Email",$SD_elements))
$this->addTable( "people_contact",
"peopleid","id",
array("emailday"),array("email"));
if (in_array("Fee Category",$SD_elements))
$this->addTable( "people_classification",
"peopleid", "id",
array("agelevel"),
array("feeLevel"));
if (in_array("Birth Year",$SD_elements)) {
$this->addTable("people_birthdays",
"peopleid","id",
array("year"),array("birthyear"));
}
if (in_array("Home Phone", $SD_elements)) {
$this->addTable("people_contact",
"peopleid","id",
array("phoneeve"),array("phonehome"));
}
if (in_array("Address", $SD_elements)) {
$this->addTable("people",
"peopleid","id",
array("address"),array("addressID"));
$this->address= new Address($this->db);
}
if (in_array("USABDA Number", $SD_elements)) {
$this->addTable("people_id_numbers",
"peopleid","id",
array("organization","idnumber"),
array("USABDANO","usabdaID"));
}
if (in_array("Package",$SD_elements)) {
$this->addTable("people_paying",
"peopleid","id",
array("org_pays","orgid"),
array("orgPays","orgPaying"));
$this->addTable("packages_purchased",
"peopleid","id",
array("packageid"),array("packageid"));
}
if ($id)
$this->retrieve();
}
Almost all of these tables are in a 1:1 mapping off of the primeID. They are split out to all the subtables primary because I'm not a beleiver in NULLs. You may also note that in the section on "address" it's alble to "add" the same table twice to add extra column:attribute mappings to an existing table.
Typical code usage:
Code: Select all
$person = new Person($db);
$person->setFirstName("Eric");
$person->setLastName("Nielsen");
$matches = $person->searchDB(); // returns list of primary keys
if (count($matches)==1)
$person->retrieve($matches[0]);
else
// disambigulate selection
$curEmail = $person->getEmail(); // populated from DB on the above call to retrieve
$person->setEmail($newEmail);
$person->postToDB(); // issues an update only against the people_contact table, and only touching the email column. (tracks dirty fields)
Actual searchDB isn't used that often -- retrieve with a primary key is much more common as typically enough state is stored in the application to follow the object graph.
Posted: Wed Apr 26, 2006 4:04 pm
by nielsene
And the system that my current TDD session seems to be leading towards would be more of a
Code: Select all
$person = $databaseGateway->findByKey("Person",142);
$person = $databaseGateway->findByCriterion("Person",new Criterion("firstname","equals","Eric"));
// Criterion design is not touched yet, just a mental wisp at present
$databaseGateway->store($person); // (no need to specify the class for update/insert)
$datebaseGateway->deleteByKey("Person",142);
// not sure about a delete by Criterion... will wait til I need it...
The database Gateway is likely to be stored in a Singleton Registry.
Within the gateway would be an associative array of class names to object mappers. Each mapper is loaded from a configuration file upon first use in the script, ie unused configuration files aren't parsed. A lot of the my existing StoreableObject code will be repurposed into the ObjectMapper. The configuration will move out of the domain class's constructor and into the mapper configuration files. The domain objects will no longer have need to know about the DB. I might lose some of my "automatic" dirty field checking; not sure which approach I'll take now either of the "easy ones" -- update everything or query then update -- or a "harder" one -- Identity map-esque style cache of retrieved objects within the DatabaseGateway than can compare the to-be-stored object's state with the previously retrieved values.
I hope to avoid a JavaBeans style convention (default constructor and all get/setters) and enforce correct state at all times, but that might be a requirement in my first few iterations.