Designing database-efficient classes
Posted: Thu Jul 24, 2003 6:20 pm
Hola.
I'm looking for some ideas on how to design a database-efficient set of classes. I'm experienced with DB design, PHP and OOP, and using PHP4 as my platform. I'm also trying to avoid all but the most common, DB-independent SQL features, since this will be used on multiple database systems. As an example of my situation, here's a description of the type of situation where this would be desirable.
I might have a database table products, with each row representing (naturally) a product. Now, I have another table, attributes, with a list of product attributes, and a table (product_attributes) which correlates the two. Each entry in products may have any number of attributes associated with it.
I'd also like to have PHP classes that represent products and attributes: Product and Attribute. The logical way to represent a product, then, is to have a Product instance, which stores an array of Attributes among its member data.
Now the difficulty: Let's say I want to do a search for products based on a certain criterion.
Option 1: I might do a query like:
Easy -- I take the results, instantiate a Product for each ID that is returned, and I have a list of products to display.
Problem there, each Product instance needs to perform another query to load the rest of the product information. In addition, each instance also needs to perform yet another query to load all of the attributes for that product.
An elegant solution: Each class takes care of its own database access -- but not very efficient.
Option 2: I'll try loading the product information up front, in my search procedure:
A little more work now. If I write a loadRow() method for Product, I can instantiate each Product, then pass it a row returned from the query. Much more efficient.. so far.
Each Product instance still needs to load its own set of attributes.
This option is more efficient, but slightly less elegant: the search method is telling each Product instance what it should contain.
Option 3: Load everything up front, in two queries. To do this, a Product no longer loads its own attributes.
Then I'd need a method in Product called loadAttributes(). The search mechanism would determine which attribute_ids are associated with each product_id, and call loadAttributes() on each product instance.
Even more efficient, even less elegant. Now there are only two queries retrieving all the data, but much more front-end work for the search mechanism. In addition, the Product and Attribute classes do less and less of their own work.
Option 4: One big query.
So every bit of information is loaded with this query. It would be parsed and loaded by separating out unique products, then separating out which attributes each has. Issue one: product data is duplicated if there are multiple attributes -- bad form. Issue two: without using table joins or some other mechanism, products without any attributes will get excluded (less critical, since there are ways around this). Mainly, though, this option a) gets too much information from the database and b) gives no control to Product and Attribute over their own data -- not to mention the huge mass of work for the search mechanism.
So, those are all options I've considered at one time or another. To date, I've not been able to come up with a good solution for this issue. It's also a situation that is duplicated many times in code I've come across, and is even much more of an issue with more complex systems (where you may have two or three levels of information nesting, where only one is outlined here -- the Product-Attribute relationship).
If anyone has any experience dealing with anything like this, I'd appreciate any insight you gained. I'm just hoping I don't slap my head and realize the perfect solution in 5 seconds.
Thanks for reading this ridiculously long post.
I'm looking for some ideas on how to design a database-efficient set of classes. I'm experienced with DB design, PHP and OOP, and using PHP4 as my platform. I'm also trying to avoid all but the most common, DB-independent SQL features, since this will be used on multiple database systems. As an example of my situation, here's a description of the type of situation where this would be desirable.
I might have a database table products, with each row representing (naturally) a product. Now, I have another table, attributes, with a list of product attributes, and a table (product_attributes) which correlates the two. Each entry in products may have any number of attributes associated with it.
I'd also like to have PHP classes that represent products and attributes: Product and Attribute. The logical way to represent a product, then, is to have a Product instance, which stores an array of Attributes among its member data.
Now the difficulty: Let's say I want to do a search for products based on a certain criterion.
Option 1: I might do a query like:
Code: Select all
SELECT product_id
FROM products
WHERE product_name LIKE '%chevy%'Problem there, each Product instance needs to perform another query to load the rest of the product information. In addition, each instance also needs to perform yet another query to load all of the attributes for that product.
An elegant solution: Each class takes care of its own database access -- but not very efficient.
Option 2: I'll try loading the product information up front, in my search procedure:
Code: Select all
SELECT *
FROM products
WHERE product_name LIKE '%chevy%'Each Product instance still needs to load its own set of attributes.
This option is more efficient, but slightly less elegant: the search method is telling each Product instance what it should contain.
Option 3: Load everything up front, in two queries. To do this, a Product no longer loads its own attributes.
Code: Select all
SELECT *
FROM products
WHERE product_name LIKE '%chevy%'Code: Select all
<?php
// Create list of Product instances, and use the rows to populate them.
// Create list of product_ids that were returned
?>Code: Select all
SELECT attributes.*, product_attributes.product_id
FROM product_attributes, attributes
WHERE product_attributes.product_id IN (<<Generated List>>)
AND product_attributes.attribute_id = attributes.attribute_idEven more efficient, even less elegant. Now there are only two queries retrieving all the data, but much more front-end work for the search mechanism. In addition, the Product and Attribute classes do less and less of their own work.
Option 4: One big query.
Code: Select all
SELECT products.*, attributes.*
FROM products, product_attributes, attributes
WHERE products.product_name LIKE '%chevy%'
AND products.product_id = product_attributes.product_id
AND product_attributes.attribute_id = attributes_idSo, those are all options I've considered at one time or another. To date, I've not been able to come up with a good solution for this issue. It's also a situation that is duplicated many times in code I've come across, and is even much more of an issue with more complex systems (where you may have two or three levels of information nesting, where only one is outlined here -- the Product-Attribute relationship).
If anyone has any experience dealing with anything like this, I'd appreciate any insight you gained. I'm just hoping I don't slap my head and realize the perfect solution in 5 seconds.
Thanks for reading this ridiculously long post.