Page 1 of 1

When to extract into extra table

Posted: Sun Aug 23, 2009 9:25 am
by matthijs
Even though I've designed a few db schemes and learned a thing or two about normalization, I still struggle with the question of when and what to extract in separate tables sometimes. To give an example: I build an app in which you can log your (sport) training. For each training unit you can choose a training method. Now say you have this db table

Code: Select all

CREATE TABLE `units` (      
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
    `method`  ENUM('ROUTETRAINING','BOULDERTRAINING','CAMPUSTRAINING','FINGERBOARD','SYSTEMBOARD','OTHER') NOT NULL,       
    `duration` SMALLINT UNSIGNED NULL, 
    `category` ENUM('strength','resistance','endurance') NOT NULL,
    `details` VARCHAR(255) NULL,
    `training_id` INT UNSIGNED NOT NULL,        
    PRIMARY KEY (`id`),
    FOREIGN KEY (`training_id`) REFERENCES `trainings` (`id`)
) ENGINE=InnoDB;
You could extract out the method to its own table and thus rewrite the above to:

Code: Select all

CREATE TABLE `unit` (       
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
    `method_id` SMALLINT UNSIGNED NOT NULL,     
    `duration` SMALLINT UNSIGNED NULL, 
    `category` ENUM('strength','resistance','endurance') NOT NULL,
    `details` VARCHAR(255) NULL,
    `training_id` INT UNSIGNED NOT NULL,        
    PRIMARY KEY (`id`),
    FOREIGN KEY (`training_id`) REFERENCES `trainings` (`id`),
    FOREIGN KEY (`method_id`) REFERENCES `methods` (`id`)
) ENGINE=InnoDB;
 
CREATE TABLE `methods` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
    `name` VARCHAR(255) NOT NULL,
    PRIMARY KEY (`id`),
) ENGINE=InnoDB;
The reason you might want to do that is that this way
- there is less space needed in the unit table (because you have a SMALLINT instead of repeating an ENUM value)
- if you want a lot of flexibility in having more methods, it easier to add them to the table methods
- when a method name would change, it's easier to change that in the mehod table instead of doing a replace in the unit table

However, what if you are certain there will never be more then 5 - 10 methods? And in case in a year time I want to add a method, adding one choice to the ENUM field in the unit table is still easy and possible. Might you still want to extract the methods to a separate table? To clarify, methods will always be added by myself as the developer, not by users of the system. And adding/changing a method would mean a lot more changes to the whole application would be needed, so it's not something I would do frequently.

But if it is indeed better to extract them to a separate table, what about
`category` ENUM('strength','resistance','endurance') NOT NULL,
One could say that this belongs in a separate table as well. So that's table no 3 and another join. Getting a simple list of trainings for a person out of the db now means having to join 5 tables (users, trainings, units, methods, categories). Besides the possible performance issues, the logic needed for this is more complex as well.

In short, when do you stop? How far would you go? And when do you go back?

Re: When to extract into extra table

Posted: Sun Aug 23, 2009 1:44 pm
by califdon
It helps for you to think in terms of entities to be represented in your database. If you want to consider something as an entity, it should be represented as a separate table. This doesn't immediately resolve every question, but it frames the questions differently. For the purposes of your database, are Methods an entity to themselves? Do they have their own attributes? Similarly for Categories. This manner of thinking about your data should be more helpful than worrying about database internals, such as storage, which are rarely of a magnitude that should concern you, unless you're designing something for Amazon or eBay or Google or the CIA. What is more important to most projects is whether the schema makes sense semantically and conforms to Codd's rules of normalization, which guarantee that queries will work as expected. That's my assessment, anyway.

Re: When to extract into extra table

Posted: Sun Aug 23, 2009 2:15 pm
by matthijs
Interesting point. So if something is an entity in my model, then use a separate table. If it's not an entity, then don't.

In the example I gave, methods are an attribute of the training unit. There will be a few of them. Methods will not be treated as an "entity" I guess. Viewing all methods, or adding or editing one, etc. will not happen. I can not guarantee I will never change them. Maybe I might add one or two in the future. But probably not very often.

The same goes for categories. There are only 3 of them, they are not entities on their own.

[edit]I guess one reason I also ask this question is my use of the ENUM field type. Reading about it it seems like one of those things that people either find fine or really hate. If I may believe some people all hell will break loose if I use that type and if I ever have the need to change something it will be an impossible, massive job, etc

Re: When to extract into extra table

Posted: Sun Aug 23, 2009 2:35 pm
by califdon
if something is an entity in my model, then use a separate table. If it's not an entity, then don't.
Definitely the first part of that statement. There is often a legitimate need for a lookup table, but typically when there is a large number of possibilities and/or when the individual choices are lengthy and/or (as you discussed) when the maintenance of the list is somewhat dynamic. Just as a personal assessment, I think that I would probably stay with the ENUM fields in your description.