Database too normalized?

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Database too normalized?

Post by matthijs »

feyd | Please use

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]


I have the following tables
[syntax="sql"]CREATE TABLE `mc_routeclimbs` (
  `climb_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `route_id` INT UNSIGNED NOT NULL,  # FK
  `user_id` INT UNSIGNED NOT NULL,  # FK
  `climb_date` DATE NOT NULL,
  `climb_style` ENUM('OS','FL','RP') NOT NULL,
  `climb_tries` SMALLINT UNSIGNED NOT NULL DEFAULT '1',
  `climb_value` TINYINT UNSIGNED,
  `climb_notes` TEXT NOT NULL,
  `climb_dateposted` DATETIME NOT NULL,
PRIMARY KEY (`climb_id`)
)
CREATE TABLE `mc_routes` (
  `route_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `area_id` INT UNSIGNED NOT NULL,
  `route_name` VARCHAR(255) NOT NULL,
  `grade` ENUM('3','4','5a','5a+','5b','5b+','5c','5c+','6a','6a+','6b','6b+','6c','6c+','7a','7a+','7b','7b+',
'7c','7c+', '8a','8a+','8b','8b+','8c','8c+','9a','9a+','9b','9b+') NOT NULL,
  `route_notes` TEXT NOT NULL,
PRIMARY KEY (`route_id`)
)
CREATE TABLE `mc_areas` (
  `area_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `country_id` INT UNSIGNED NOT NULL,  # FK
  `area_name` VARCHAR(255) NOT NULL,
  `area_notes` TEXT NOT NULL,
PRIMARY KEY (`area_id`)
)
CREATE TABLE `mc_countries` (
  `country_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `country_name` VARCHAR(255) NOT NULL,
PRIMARY KEY (`country_id`)
)
So, this is all normalized neatly (sort of, I'm no expert). But now it comes down to the logic of inserting data.

This is for an app in which people can manage their routeclimbs. The main action for the user will be to add routes. Each route is in a certain area, each area is in a certain country. So a user should also be able to manage the area's and countries. But that's not his main action.

User story:
1. user registers. Insert userdata in user table. user gets user_id. store in session.
2. user logs in. Wants to add a routeclimb. Fills in form with:
climb_date
route_name
route_grade
area_name
climb_style
climb_tries
climb_value
climb_notes

To be able to do this I need to:
- check in area table if area already exists
- if yes, retrieve area_id
- if not, insert area in areas table. Then retrieve area_id
- check in routes table if route already exists
- if yes, retrieve route_id.
- if not, insert route_name, route_grade together with area_id into routes. Retrieve route_id
- get user_id from session
- insert route_id, user_id, climb_date, climb_style, climb_tries, climb_value, climb_notes into routeclimbs

Bu there are a few issues
- these are a lot of queries for what is in fact one entry. Isn't there a way I could redesign my tables to do this differently? I know, premature optimization, but I'm still asking. Maybe there's a clean and simple alternative. Or is having 5-7 queries for one data entry ok? (I don't know what's usual)
Retrieving the climbs is a lot easier, in one big joint I can join all tables and retrieve all necessary data.

- data integrity. If for example the area doesn't exist yet, I insert an area, without having a country. Then the area table is not complete anymore. Of course I can let the user go to the area section at some other time and let him manage area's (and insert missing countries).

Maybe I can shuffle around with primary keys a bit or something (now all tables have INT primary keys, but I could also use areaname for example). On the other side, for the data integrity, my guess is that I did design the tables fairly well. There could be more area's with the same name, but in different countries, for example. So therefore it wouldn't be wise to use the areaname as PK.

Also, putting all data in one table (for example, the routename, routegrade, routearea) leads to a lot of duplication.

Sorry for the no-so-precise question, but I would just like to have some feedback on this, before I write a lot of code I can throw away in a few months..


feyd | Please use[/syntax]

Code: Select all

,

Code: Select all

and [syntax="..."] tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url] to learn how to do it too.[/color]
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

these are a lot of queries for what is in fact one entry. Isn't there a way I could redesign my tables to do this differently? I know, premature optimization, but I'm still asking. Maybe there's a clean and simple alternative. Or is having 5-7 queries for one data entry ok? (I don't know what's usual)
A normalized database that is anything but trivial usually results in several queries. You could probably redesign your tables so you did less, but this would likely come at the expesnse of being normalized.

I'm thinking your tables look fine. :)
data integrity. If for example the area doesn't exist yet, I insert an area, without having a country. Then the area table is not complete anymore. Of course I can let the user go to the area section at some other time and let him manage area's (and insert missing countries).
That is tricky but more dependent on how you design your application logic than database scehma. It's similar to the object-relational impedance mismatch whcih spurred the development of ORM, etc. Fortunately it's easier to solve.

You just need to add the application logic instances where one is created and the other not. If the user registers and creates a 'route' but doesn't associate it with an 'area/country/whatever' show an error message and delete the record or forward them onto the 'location' page.
Maybe I can shuffle around with primary keys a bit or something (now all tables have INT primary keys, but I could also use areaname for example). On the other side, for the data integrity, my guess is that I did design the tables fairly well. There could be more area's with the same name, but in different countries, for example. So therefore it wouldn't be wise to use the areaname as PK.

Also, putting all data in one table (for example, the routename, routegrade, routearea) leads to a lot of duplication.
Never use string in primary keys they change to much. Even in a situation where you think it'll be globally unique such as an 'email' address I'ev been stung. Use integer ID's. :)

You said it best with the tons of duplication...just follow your first instinct and bite the slightly more complex bullet. It will lead to a more intuitive design anyways.

Cheers :)
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Thanks for the reply Hockey. It was just that I started working on the logic and saw what I had to do .. while all the examples in the books are neatly done with one table ;)

But thinking and posting about this and your reply has given me some good thoughts. It is also the possibilities of using the data in the future which might indicate that a good normalized db is the way to go.

About the data integrity: I think I will design the area table in such a way that it's possible to add an area without a country. Then when I retrieve a list of the areas, I will just show "unknown" for the country in the results. (and people will be able to add those later).

It's funny how complex a few database tables can be. You have to think about all possible situations, the kind of data being allowed and not, etc
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

But thinking and posting about this and your reply has given me some good thoughts. It is also the possibilities of using the data in the future which might indicate that a good normalized db is the way to go.
Absolutely. Thats why I use the community. Most questions I ask, I typically have several scenarios planned out in my head, I just need a gentle nudge in the right direction.

When you do something right, it's just obvious that it's the best approach. If you can reuse a table or two in the future, thats a great sign!!!
About the data integrity: I think I will design the area table in such a way that it's possible to add an area without a country. Then when I retrieve a list of the areas, I will just show "unknown" for the country in the results. (and people will be able to add those later).
Only you can answer that as your application unfolds. Whether it works technically and whether end users have to jump through rings of fire to make the application work. The latter force is what drives most software into "junkware".
It's funny how complex a few database tables can be. You have to think about all possible situations, the kind of data being allowed and not, etc
It plays a major role in application development so I can see how that happens. :)
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Hockey wrote:Absolutely. Thats why I use the community. Most questions I ask, I typically have several scenarios planned out in my head, I just need a gentle nudge in the right direction.
Indeed. thanks again
Hockey wrote:
About the data integrity: I think I will design the area table in such a way that it's possible to add an area without a country. Then when I retrieve a list of the areas, I will just show "unknown" for the country in the results. (and people will be able to add those later).
Only you can answer that as your application unfolds. Whether it works technically and whether end users have to jump through rings of fire to make the application work. The latter force is what drives most software into "junkware".
Yes, these things are very important to consider. That's why I want to focus on the "adding routes" part, because that's what the user wants to do. And that is what I want to make as easy and simple as possible.

The alternative would be, that people should first add an area (with country) and only after that being able to add a route in that area. But it's a bit like when you post a weblog entry and you want to add a category. You don't want to first think about the categories. You just want to write the post first, and then maybe think about the categories. If you don't add a category immediately, you get the default category "uncategorized".

I could also, at some time, add some ajaxy stuff (progressive enhanced of course) to let people add the area's, while on the page of adding the route.
jmut
Forum Regular
Posts: 945
Joined: Tue Jul 05, 2005 3:54 am
Location: Sofia, Bulgaria
Contact:

Post by jmut »

I guess we speak about mysql here in particular. I see you didn't set the engine in your create queries which on most servers defaults to MyISAM.
Using that many queries to do on whole action...cries for using transactions...else you might come up with pretty inconsistant db state...which in my opinion could be worse than having non-normalized db. So my point is, don't forget to use transactions, for this you need InnoDB engine though.
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Good point jmut. Knew about them, but haven't looked into them yet. Thanks.
Post Reply