Page 1 of 1

DB Normalization question

Posted: Sun Feb 08, 2004 6:23 pm
by ilovetoast
I'll be the first to admit, I have often relied on others to handle db setup for my projects and merely conformed my code to work effectively with their physical model. Now I'm working on my own project, and I want to make sure that I have properly normalized my db. My target is 4NF.

Here's the tables in question. There are actually 2 sets like this, but one example will suffice (I have eliminated some non-relevant attributes):

Code: Select all

Article
    ArticleID (PK)       BIGINT
    Title                VARCHAR (100)
    Text                 CLOB

Author
    AuthorCode (PK)      CHAR (3)
    Name                 VARCHAR (50)
    Email                VARCHAR (150)
So, here's the question... How do I set up the join table correctly for 4NF, given that it is a Many-to-Many relationship (articles can be written by multiple authors, authors can write multiple articles). I'm thinking that this would be the proper join table for 4NF, but I'm not sure.

Code: Select all

ArticleAuthor
    AuthorArticleID (PK) BIGINT
    ArticleID (FK)       BIGINT
    AuthorCode (FK)      CHAR (3)
If it was a Many-to-One relationship, I would eliminate the AuthorArticleID attribute, but I'm thinking that since it's Many-to-Many, the attribute is needed. An example of the data would be:

Code: Select all

ArticleAuthor
    101     1    ABC
    102     1    DEF
    103     2    SSS
    104     3    DEF
    105     4    XYZ
I have read through the O'Reilly Java DB Best Practices ch2 pdf and several other sources at hand. However, I admit I'm still uncertain. I think the PK has to be there since neither of the other two field is necessarily unique, but I want to be certain.

For those who have the O'Reilly chapter:
I see that in the O'Reilly example - the ReviewerGenre join table to be specific - they don't use a new PK. Instead, they call both the reviewerID and the code as PKs. It is my understanding that this mmust be an error. They make acceptable FKs, but PKs have to be unique, so neither would be acceptable to use as PKs.

Thanks in advance for any help.

peace

Go Leafs Go!!!!

Posted: Mon Feb 09, 2004 12:18 am
by microthick
Hi,

The primary key that you've specified is not needed.

The primary key would in fact be the set containing both authorcode and articleid. They are both individually also foreign keys as you've specified.

Because space is typically not really an issue right now, most people nowadays will specify the primary key as you have (authorarticleid), then also ensure that the set containing both authorcode and articleid are also a key. The relationships may not be properly normalized (I usually normalize to bcnf not 4nf so my points may be skewed slightly because of it) but it'll save you some headaches later on by possibly making record selection and updating easier.

Posted: Mon Feb 09, 2004 8:47 am
by ilovetoast
Ahh, I understand. Thanks. One further question though. Consider a third table I have:

Code: Select all

Language
    LanguageCode (PK)         CHAR (2)
    Name                      VARCHAR (50)
There is a one-to-many relationship between this table and the article table (many articles in each language/one language per article row), resulting in this join table:

Code: Select all

ArticleLanguage
    ArticleID (PK) (FK)       BIGINT
    LanguageCode (FK)         CHAR (3)
My question is, should the PK for the join table be ArticleID (as shown) or the set of ArticleID and LanguageCode? ArticleID is unique for each row, so I had thought that was the way to go. Looking through some reading material though, I see that others might choose to go with the set.

So the question is should I go with the set and if so why does that make a better/proper choice as compared to just the ArticleID?

thanks again

Posted: Mon Feb 09, 2004 9:05 am
by microthick
With one-to-many relationships, you don't need a third table. In your case, ArticleLanguage is not needed.

The primary key of the Language table is LanguageCode. Add that as a foreign key in your Article table and you're set.

Otherwise, your tables aren't properly normalized and you have redundancy.

Posted: Mon Feb 09, 2004 9:20 am
by McGruff
A further snippet: EXPLAIN can be handy to see if your join queries are being carried out efficiently. See manual for full explanation.

For example, cols used to join tables must be exactly the same type.

Posted: Mon Feb 09, 2004 11:37 am
by ilovetoast
Thanks again. I feel that I'm starting to get my head around the normalization process.

Fixed the language code typo so that both cols are the same. Thanks for pointing that out.

Off to explore the mysteries of the EXPLAIN query.

peace

Posted: Mon Feb 09, 2004 11:45 am
by ilovetoast
Oh. Quick plug here. I am using OmniGraffle Pro to do the diagrams. I love that program, absolutely love it. If you're a Mac user and need top quality charting program, check it out.