How far to go with normalization?
Posted: Thu Jan 05, 2006 10:52 am
After reading some tutorials I have a basic understanding of the why and how of database normalization. As an exercise I've designed a database scheme with some tables. I have created multiple tables in the normalization process and have been able to get rid of most redundant data and get a good (?) relational model.
As my tables are now, there's still some data duplication. But I'm not sure if making more seperate tables is necessary or wanted. I'll first show what I have.
As you can see, some columns have duplicate data.
In the table climbs, the climbing style is duplicated. When should I decide if a new table for the available styles is better? I know in advance there's only a couple of styles and they wont change.
In the table routes the grades are duplicated sometimes. As there are only 30 different possible grades which wont change very soon, creating a new table for the grades would seem too much to me.
In both tables climbers and areas there's a column country. A user using the system (searching or putting in data) would select a country from a dropdown menu, so I'm pretty sure a valid country will be selected (to prevent data inconsistencies/anomalies). But still, what if I want to change the name of a country in the future. For example, due to a civil war the name of some country is changed. In that case a seperate table for countries would be easier for updating.
I know that for a further normalization I would have to create more tables. However, the downside will be that the queries will be more complicated and (from what I understand) slower.
Any advice in this matter would be appreciated.
- Are there problems with this scheme?
- What could be improved?
- Are there general rules to decide how far to go with the normalization?
As my tables are now, there's still some data duplication. But I'm not sure if making more seperate tables is necessary or wanted. I'll first show what I have.
Code: Select all
Climbs
-------------------------------------------------------------------
climb_id(PK) route_id(FK) climber_id(FK) climb_date climb_style climb_tries climb_notes
1 1 2 12/12/05 style1 2 blabla
2 2 1 21/12/05 style2 6 blabla
3 5 1 15/12/05 style2 4 blabla
4 4 2 17/11/05 style1 2 blabla
5 4 4 17/11/05 style1 6 blabla
..
Routes
------------------------------------------------------------------
route_id(PK) area_id(FK) route_name route_grade
1 2 somename1 4
2 2 somename2 5b
3 1 somename3 3
4 3 somename4 7a
5 5 somename5 7a
6 5 somename5 7a
7 5 somename6 6b
..
Climbers
-----------------------------------------------------------------
climber_id(PK) climber_nickname climber_realname climber_sex climber_country
1 nic Ann F USA
2 red Chris M Spain
3 boy Bart M USA
4 bull Jan M Russia
..
Areas
-----------------------------------------------------------------
area_id(PK) area_name area_country area_notes
1 areaA USA blabla
2 areaB USA blablabla
3 areaC Spain blabla
..In the table climbs, the climbing style is duplicated. When should I decide if a new table for the available styles is better? I know in advance there's only a couple of styles and they wont change.
In the table routes the grades are duplicated sometimes. As there are only 30 different possible grades which wont change very soon, creating a new table for the grades would seem too much to me.
In both tables climbers and areas there's a column country. A user using the system (searching or putting in data) would select a country from a dropdown menu, so I'm pretty sure a valid country will be selected (to prevent data inconsistencies/anomalies). But still, what if I want to change the name of a country in the future. For example, due to a civil war the name of some country is changed. In that case a seperate table for countries would be easier for updating.
I know that for a further normalization I would have to create more tables. However, the downside will be that the queries will be more complicated and (from what I understand) slower.
Any advice in this matter would be appreciated.
- Are there problems with this scheme?
- What could be improved?
- Are there general rules to decide how far to go with the normalization?