How much information to store in tables
Moderator: General Moderators
How much information to store in tables
Regarding attributes in rows, what is good practive when it comes to how many attributes should be placed within a table.
for example, I currently place firstName Lastname and Username in one table called members.
Is it generally preferable to divide elements which are often going to be querried into their own tables for faster searches?
for example, I currently place firstName Lastname and Username in one table called members.
Is it generally preferable to divide elements which are often going to be querried into their own tables for faster searches?
I don't have a strict count for them. I usually leave them the way i found the attributes while i was drawing an ERD..
Thus, if i find 20 attributes that belong to an entity, there will be 20 columns in that table. Although, i've found that after normalization there usually aren't more than +/- 5 attributes that are not in a repeating group.
If you end up performing a lot of joins to get that one column value from another table you might consider adding it ni the original table. Thus, introducing a little of controled redundancy. But, if you are using a decent DBMS you can create a view so you don't have to care about where the data really is coming from..
Thus, if i find 20 attributes that belong to an entity, there will be 20 columns in that table. Although, i've found that after normalization there usually aren't more than +/- 5 attributes that are not in a repeating group.
If you end up performing a lot of joins to get that one column value from another table you might consider adding it ni the original table. Thus, introducing a little of controled redundancy. But, if you are using a decent DBMS you can create a view so you don't have to care about where the data really is coming from..
I like my tables to be "narrow" that normally means fewer than 10 columns.
I will commonly split of "satellite" tables (creating a 1:1 mapping) when there is a logical collection of columns that belong together, but aren't commonly needed with the rest of the table. I think this is the same thing as feyd is saying.
My canonical example regards a login table:
Many people have it something like
I would normally split this up to:
In practice I actually use the username as the primary key for all the user tables as I dislike surrogate keys and user_name fits the bill nicely. I push first/last name out to "people" as I often need to have records for non-users. Email isn't left in the users field as people often have alternate emails, some might be "no-mail" if its just for accepting mail requests from that address, etc. Or sometimes instead of "mail_to" I'll use 'main_addr' and its the only one that recieves mail "official" mail from the site.
The users table is basically hit on every single webpage request for authentication. Therefore its a big win if the DBMS decides to cache it. Keeping the table narrow helps to encourage that and helps to ensure that keeping it cached doesn't push too many other important tabless out of the cache.
I will commonly split of "satellite" tables (creating a 1:1 mapping) when there is a logical collection of columns that belong together, but aren't commonly needed with the rest of the table. I think this is the same thing as feyd is saying.
My canonical example regards a login table:
Many people have it something like
Code: Select all
CREATE TABLE users (
userid SERIAL PRIMARY KEY, -- SERIAL is PostGreSQL "auto increment" both a type and a generator
username VARCHAR(15) NOT NULL UNIQUE,
hashed VARCHAR(47) NOT NULL,
email VARRCHAR(40) NOT NULL,
firstname VARCHAR(30) NOT NULL,
lastname VARCHAR(40) NOT NULL,
status_name VARCHAR(10) NOT NULL REFERENCES user_statuses ON UPDATE CASCADE ON DELETE RESTRICT,
sig TEXT NOT NULL DEFAULT '',
location TEXT NOT NULL DEFAULT '',
... // more profile type columns
);Code: Select all
CREATE TABLE people (
peopleid SERIAL PRIMARY KEY,
firstname VARCHAR(30) NOT NULL,
lastname VARCHAR(40) NOT NULL
);
CREATE TABLE users (
userid SERIAL PRIMARY KEY,
personid INT NOT NULL REFERENCES people ON UPDATE CASCADE ON DELETE RESTRICT
username VARCHAR(15) NOT NULL UNIQUE,
hashed VARCHAR(47) NOT NULL,
status_name VARCHAR(10) NOT NULL REFERENCCES user_status ON UPDATE CASCADE ON DELETE RESTRICT
);
CREATE TABLE user_emails (
userid INT NOT NULL REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE,
email_address VARCHAR(40) NOT NULL,
mail_to BOOLEAN NOT NULL DEFAULT TRUE,
PRIMARY KEY(userid, email_address)
);
CREATE TABLE user_profile (
userid INT NOT NULL REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE
-- all the profile type columns
);The users table is basically hit on every single webpage request for authentication. Therefore its a big win if the DBMS decides to cache it. Keeping the table narrow helps to encourage that and helps to ensure that keeping it cached doesn't push too many other important tabless out of the cache.
normalized
So long as the tables are normalized,
it should not matter. 3rd normal form is good enough
it should not matter. 3rd normal form is good enough
Re: normalized
I disagree. First I tend to go for full 5NF, migrating to 6NF as I get enough tools to generate all the constraints/views automatically.AnarKy wrote:So long as the tables are normalized,
it should not matter. 3rd normal form is good enough
Second, first you normalize, then you tune. When tuning I do NOT de-normalize. However I will split a table (which doesn't break normalization).
All databases have a concept or "Work Memory" or "Sort Memory" etc. If the tables involved in a join fit into the "Work/Sort" memory (across all simultaneous queries) then you "win" the DB doesn't have to swap to disc. That advocates keeping the row "width" low. Yes don't want the preformence overhead of having to do three extra joins every query (even if hiding the decomposition behind a view), but if you don't commonly need those extra columns, separate them out.
You might be interested in Java Database Best Practices.
I'd be very, very wary of any "Java Database Best Practices" the Java community as a whole doesn't tend to "understand" Relational Databases. (Mainly because of the different models -- Object versus Relational.) The Java community tends to blame the Relational Model for everything.McGruff wrote:You might be interested in Java Database Best Practices.
On a quick glance, that chapter makes at least three glaring mistakes from a Relational model view point: Primary Key!="special index", advoacates using NULLs, and it is strongly against 1:1.
Re: normalized
I meant for small apps and inexperienced developers...nielsene wrote:I disagree. First I tend to go for full 5NF, migrating to 6NF as I get enough tools to generate all the constraints/views automatically.AnarKy wrote:So long as the tables are normalized,
it should not matter. 3rd normal form is good enough
going above 3NF is usually too much for some people to handle.
If you go for 5NF wouldn't this rule out 1to1 relationships? I thought this actually is ruled out by 3Nf already.
As I am only using MySQL I am mostly using no 1 to 1 relationships. MySQL is pretty fast when it comes to use a lot of tables and data. I just try to use static tables as much as possible as this is a lot faster than dynamic and hd space is cheap.
As I am only using MySQL I am mostly using no 1 to 1 relationships. MySQL is pretty fast when it comes to use a lot of tables and data. I just try to use static tables as much as possible as this is a lot faster than dynamic and hd space is cheap.
No, nothing in normalzation talks about when to combine relations. Thus splitting an otherwise already normalized relation won't yield a situation where you are directed to re-combine it. You might have to further normalize the splits if you have non-key dependencies in the satellite after splitting -- but that's only possible if you didn't start from a 4NF starting point.AGISB wrote:If you go for 5NF wouldn't this rule out 1to1 relationships? I thought this actually is ruled out by 3Nf already.