Page 1 of 1

Designing database-efficient classes

Posted: Thu Jul 24, 2003 6:20 pm
by nullbyte
Hola.

I'm looking for some ideas on how to design a database-efficient set of classes. I'm experienced with DB design, PHP and OOP, and using PHP4 as my platform. I'm also trying to avoid all but the most common, DB-independent SQL features, since this will be used on multiple database systems. As an example of my situation, here's a description of the type of situation where this would be desirable.

I might have a database table products, with each row representing (naturally) a product. Now, I have another table, attributes, with a list of product attributes, and a table (product_attributes) which correlates the two. Each entry in products may have any number of attributes associated with it.

I'd also like to have PHP classes that represent products and attributes: Product and Attribute. The logical way to represent a product, then, is to have a Product instance, which stores an array of Attributes among its member data.

Now the difficulty: Let's say I want to do a search for products based on a certain criterion.
Option 1: I might do a query like:

Code: Select all

SELECT product_id
  FROM products
WHERE product_name LIKE '%chevy%'
Easy -- I take the results, instantiate a Product for each ID that is returned, and I have a list of products to display.
Problem there, each Product instance needs to perform another query to load the rest of the product information. In addition, each instance also needs to perform yet another query to load all of the attributes for that product.
An elegant solution: Each class takes care of its own database access -- but not very efficient.

Option 2: I'll try loading the product information up front, in my search procedure:

Code: Select all

SELECT *
  FROM products
 WHERE product_name LIKE '%chevy%'
A little more work now. If I write a loadRow() method for Product, I can instantiate each Product, then pass it a row returned from the query. Much more efficient.. so far.
Each Product instance still needs to load its own set of attributes.
This option is more efficient, but slightly less elegant: the search method is telling each Product instance what it should contain.

Option 3: Load everything up front, in two queries. To do this, a Product no longer loads its own attributes.

Code: Select all

SELECT *
  FROM products
 WHERE product_name LIKE '%chevy%'

Code: Select all

<?php
// Create list of Product instances, and use the rows to populate them. 
// Create list of product_ids that were returned
?>

Code: Select all

SELECT attributes.*, product_attributes.product_id
  FROM product_attributes, attributes
 WHERE product_attributes.product_id IN (&lt;&lt;Generated List&gt;&gt;)
   AND product_attributes.attribute_id = attributes.attribute_id
Then I'd need a method in Product called loadAttributes(). The search mechanism would determine which attribute_ids are associated with each product_id, and call loadAttributes() on each product instance.
Even more efficient, even less elegant. Now there are only two queries retrieving all the data, but much more front-end work for the search mechanism. In addition, the Product and Attribute classes do less and less of their own work.

Option 4: One big query.

Code: Select all

SELECT products.*, attributes.*
  FROM products, product_attributes, attributes
 WHERE products.product_name LIKE '%chevy%'
   AND products.product_id = product_attributes.product_id
   AND product_attributes.attribute_id = attributes_id
So every bit of information is loaded with this query. It would be parsed and loaded by separating out unique products, then separating out which attributes each has. Issue one: product data is duplicated if there are multiple attributes -- bad form. Issue two: without using table joins or some other mechanism, products without any attributes will get excluded (less critical, since there are ways around this). Mainly, though, this option a) gets too much information from the database and b) gives no control to Product and Attribute over their own data -- not to mention the huge mass of work for the search mechanism.

So, those are all options I've considered at one time or another. To date, I've not been able to come up with a good solution for this issue. It's also a situation that is duplicated many times in code I've come across, and is even much more of an issue with more complex systems (where you may have two or three levels of information nesting, where only one is outlined here -- the Product-Attribute relationship).

If anyone has any experience dealing with anything like this, I'd appreciate any insight you gained. I'm just hoping I don't slap my head and realize the perfect solution in 5 seconds.

Thanks for reading this ridiculously long post.

Posted: Thu Jul 24, 2003 11:45 pm
by nielsene
I don't exactly understand your problem/question...

I've dealt with the oft-cited "database-object impedance" problem in the following manner.

I create a base class named StoreableObject. Storeable object has three public methods: postToDB, retrieve, search and two pairs of get/setters: get/setID, get/setGeneralData. It also has a private set of helper functions for adding/removing a table from the object. It has a large number of internal attributes that map the class's variable names to the tables and columns in the database. The get/set's handle flagging dirty data so the minimal number of queries are executed on a postToDB. postToDB also handles determining if an update/insert is needed. retrieve uses an primary key value to populate the class. search finds all primary keys that match whatever set of attributes that have been set.

Then I subclass StoreableObject for each data object I need. For instance I might have a Person object. The person constructor handles all the calls to addTable to handle the mapping. Typically the only other code I need to write are the specific get/sets to avoid calling get/setGeneralData directly --

Code: Select all

function getFirstName()
  {
      return $this->getGeneralData("FirstName");
  }
-- or class functionality that doesn't deal with the database.

This method is definately NOT the most efficient, however, it is clean and powerful. I've found it to be an extremely useful abstraction -- I can forget I'm dealing with a database if I want to.

In my case the Person class pulls information from four tables -- two are sateillite 1:(0-1) tables from the main perople tables, the other is a 1:m
table of registrations. StoreableObject can handle array or single value attributes with ease.

Posted: Fri Jul 25, 2003 2:26 am
by nullbyte
Well, the issue is twofold. First is dealing with objects which logically "contain" data held in other tables, where there is no 1-to-1 correspondence. Second is instantiation of a number of objects where each is the representation of a table row -- and each row is obtained from the same query.

I don't know if this clears up the issue for you, but I think it summarizes it.

Posted: Fri Jul 25, 2003 2:59 am
by Judas
i like relational data base's. Only i use my own data make up.

Posted: Fri Jul 25, 2003 8:54 am
by nielsene
nullbyte wrote:Well, the issue is twofold. First is dealing with objects which logically "contain" data held in other tables, where there is no 1-to-1 correspondence. Second is instantiation of a number of objects where each is the representation of a table row -- and each row is obtained from the same query.

I don't know if this clears up the issue for you, but I think it summarizes it.
Yes, in my example above, my Person object contains data from other tables, some of which were multiple rows.

Here's some sample code (this is stuff I wrote about 2-3 years ago when iwas just starting PHP.... it desperately needs to be cleaned up...)
http://ballroom.mit.edu/compinabox/Publ ... t.inc.phps
http://ballroom.mit.edu/compinabox/Publ ... n.inc.phps
http://ballroom.mit.edu/compinabox/Publ ... e.inc.phps

So I was forgetful in my first post, Person only has the prime table and some number of 1: (0-1) tables. Couples has the prime table and then one 1:m table (like you product-attributes relation).

This completely offers one solution method for the first issue you raise. The second issue is not so much an issue as a design choice. If you have to commonly instantiate arrays of objects, then yes I would design a bulk load version of my StoredObject, for instance search() cound return an array of objects, instead of an array of primary keys.

Posted: Tue Nov 11, 2003 3:56 pm
by ahundiak
I struggled with this issue when writing a small soccer referee scheduling application. Had about 30 tables with quite a few many-to-many relations. I finally settled on your option three. For each table I made a collection class which contains an array of individual record objects.

It's fairly easy to write methods like $product->getAttributes() which looks up the attributes in the attributes collection and returns the list. The business logic layer neither knows nor cares how the attributes get stored.

I don't really like the one query per table approach but the huge queries just return too much duplicate data.

Posted: Tue Dec 30, 2003 11:30 pm
by psychohist@aol.com
If you're really going to follow an object oriented approach, you should let the object do the work, in this case loading its own data from the database.

For most applications, the loss of efficiency, if any, really won't be an issue, especially if you have properly indexed your tables. Remember, "Premature optimization is the root of all evil" (Knuth).

If after things are up and running, you actually have problems with speed, you can worry about optimization. The cheapest optimization these days - after adding table indices - is likely to buy faster hardware. If you've already done that, buy faster software (probably Oracle, in this case). If you've done that too, you can add a batch creation function that creates a batch of objects more efficiently, skipping from method 1 to method 4 (though I bet there's a way to get Oracle to cache query plans that would be just as beneficial).

Posted: Fri Jan 16, 2004 2:06 pm
by jaxn
One big query is by far the fastest option. Make sure you refine the query so that you are only getting the information that you need and nothing more.

-Jackson

Posted: Sun Jan 18, 2004 9:51 am
by nufferkay
Well, remember what they say about premature optimization....

You also might consider which is going to be more maintainable - probably having each class handle its own queries, but I could be wrong.

Posted: Thu Jan 22, 2004 11:52 am
by BDKR
I'd also like to have PHP classes that represent products and attributes:
In other words, you realize that you don't need this. You would just like it.

But if you must, you shouldn't be using a relational database in the first place. Take a look at some of the various 0DBMS (Object DataBase Management System) or ORDBMS (Object Relational DataBase Management System) systems out there. As NielsenE pointed out, there is what they call an impedance mismatch here. It's the result of trying to represent a hierchal data structure (trees as an example) in a relational system. Code bases managing objects in relational systems need to be at least 40% to 60% larger than those sitting on top of dedicated ODBMS's.

There are a ton of other issues to consider here as well. Mapping objects onto an ODBMS will be faster than trying to do so with an RDBMS, but a conventional 'result-set' approach will still be fastest. The ODMBS camp also suggest the use of application caches (Intersystems Cache is a good example) to help deal with the performance issues. They will say that it's to help persist data, but that's BS. The act of traversing a tree type data structure in a databse will be slower than standard SQL queries against a relational schema. This use of caches also introduces the possibility of data corruption.

This can go on and on......

Before going down this road, do your homework! I blogged some of my experience with this stuff.
http://mgaps.highsidecafe.com/index.php?p=106
There is also a link on that page to a site that talks a good deal about it, but it's extremely vauge and misleading. If you don't pay attention to what's really being said here, it'll have you off on the wrong vector.

If you want some seriously hardcore info on this, look into what Fabian Pascal has to say about it.

Cheers,
BDKR