When to extract into extra table
Posted: Sun Aug 23, 2009 9:25 am
Even though I've designed a few db schemes and learned a thing or two about normalization, I still struggle with the question of when and what to extract in separate tables sometimes. To give an example: I build an app in which you can log your (sport) training. For each training unit you can choose a training method. Now say you have this db table
You could extract out the method to its own table and thus rewrite the above to:
The reason you might want to do that is that this way
- there is less space needed in the unit table (because you have a SMALLINT instead of repeating an ENUM value)
- if you want a lot of flexibility in having more methods, it easier to add them to the table methods
- when a method name would change, it's easier to change that in the mehod table instead of doing a replace in the unit table
However, what if you are certain there will never be more then 5 - 10 methods? And in case in a year time I want to add a method, adding one choice to the ENUM field in the unit table is still easy and possible. Might you still want to extract the methods to a separate table? To clarify, methods will always be added by myself as the developer, not by users of the system. And adding/changing a method would mean a lot more changes to the whole application would be needed, so it's not something I would do frequently.
But if it is indeed better to extract them to a separate table, what about
`category` ENUM('strength','resistance','endurance') NOT NULL,
One could say that this belongs in a separate table as well. So that's table no 3 and another join. Getting a simple list of trainings for a person out of the db now means having to join 5 tables (users, trainings, units, methods, categories). Besides the possible performance issues, the logic needed for this is more complex as well.
In short, when do you stop? How far would you go? And when do you go back?
Code: Select all
CREATE TABLE `units` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`method` ENUM('ROUTETRAINING','BOULDERTRAINING','CAMPUSTRAINING','FINGERBOARD','SYSTEMBOARD','OTHER') NOT NULL,
`duration` SMALLINT UNSIGNED NULL,
`category` ENUM('strength','resistance','endurance') NOT NULL,
`details` VARCHAR(255) NULL,
`training_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
FOREIGN KEY (`training_id`) REFERENCES `trainings` (`id`)
) ENGINE=InnoDB;Code: Select all
CREATE TABLE `unit` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`method_id` SMALLINT UNSIGNED NOT NULL,
`duration` SMALLINT UNSIGNED NULL,
`category` ENUM('strength','resistance','endurance') NOT NULL,
`details` VARCHAR(255) NULL,
`training_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
FOREIGN KEY (`training_id`) REFERENCES `trainings` (`id`),
FOREIGN KEY (`method_id`) REFERENCES `methods` (`id`)
) ENGINE=InnoDB;
CREATE TABLE `methods` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(255) NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB;- there is less space needed in the unit table (because you have a SMALLINT instead of repeating an ENUM value)
- if you want a lot of flexibility in having more methods, it easier to add them to the table methods
- when a method name would change, it's easier to change that in the mehod table instead of doing a replace in the unit table
However, what if you are certain there will never be more then 5 - 10 methods? And in case in a year time I want to add a method, adding one choice to the ENUM field in the unit table is still easy and possible. Might you still want to extract the methods to a separate table? To clarify, methods will always be added by myself as the developer, not by users of the system. And adding/changing a method would mean a lot more changes to the whole application would be needed, so it's not something I would do frequently.
But if it is indeed better to extract them to a separate table, what about
`category` ENUM('strength','resistance','endurance') NOT NULL,
One could say that this belongs in a separate table as well. So that's table no 3 and another join. Getting a simple list of trainings for a person out of the db now means having to join 5 tables (users, trainings, units, methods, categories). Besides the possible performance issues, the logic needed for this is more complex as well.
In short, when do you stop? How far would you go? And when do you go back?