How much information to store in tables

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
taldos
Forum Commoner
Posts: 39
Joined: Mon Aug 23, 2004 8:47 am
Location: Philadelphia

How much information to store in tables

Post by taldos »

Regarding attributes in rows, what is good practive when it comes to how many attributes should be placed within a table.

for example, I currently place firstName Lastname and Username in one table called members.

Is it generally preferable to divide elements which are often going to be querried into their own tables for faster searches?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I will, quite often do something where fields queried often would be a seperate table from those directly related, but not queries so often. Although I'm not absolutely sure it's the "best" route, but it definitely is a route. :)
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

I don't have a strict count for them. I usually leave them the way i found the attributes while i was drawing an ERD..
Thus, if i find 20 attributes that belong to an entity, there will be 20 columns in that table. Although, i've found that after normalization there usually aren't more than +/- 5 attributes that are not in a repeating group.

If you end up performing a lot of joins to get that one column value from another table you might consider adding it ni the original table. Thus, introducing a little of controled redundancy. But, if you are using a decent DBMS you can create a view so you don't have to care about where the data really is coming from.. ;)
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

I like my tables to be "narrow" that normally means fewer than 10 columns.

I will commonly split of "satellite" tables (creating a 1:1 mapping) when there is a logical collection of columns that belong together, but aren't commonly needed with the rest of the table. I think this is the same thing as feyd is saying.

My canonical example regards a login table:
Many people have it something like

Code: Select all

CREATE TABLE users (
  userid SERIAL PRIMARY KEY, -- SERIAL is PostGreSQL "auto increment" both a type and a generator
  username VARCHAR(15) NOT NULL UNIQUE,
  hashed VARCHAR(47) NOT NULL,
  email VARRCHAR(40) NOT NULL,
  firstname VARCHAR(30) NOT NULL,
  lastname VARCHAR(40) NOT NULL,
  status_name VARCHAR(10) NOT NULL REFERENCES user_statuses ON UPDATE CASCADE ON DELETE RESTRICT,
  sig TEXT NOT NULL DEFAULT '',
  location TEXT NOT NULL DEFAULT '',
  ... // more profile type columns
);
I would normally split this up to:

Code: Select all

CREATE TABLE people (
  peopleid SERIAL PRIMARY KEY,
  firstname VARCHAR(30) NOT NULL,
  lastname VARCHAR(40) NOT NULL
);

CREATE TABLE users (
  userid SERIAL PRIMARY KEY,
  personid INT NOT NULL REFERENCES people ON UPDATE CASCADE ON DELETE RESTRICT
  username VARCHAR(15)  NOT NULL UNIQUE,
  hashed VARCHAR(47) NOT NULL,
  status_name VARCHAR(10) NOT NULL REFERENCCES user_status ON UPDATE CASCADE ON DELETE RESTRICT
);  

CREATE TABLE user_emails (
  userid INT NOT NULL REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE,
  email_address VARCHAR(40) NOT NULL,
  mail_to BOOLEAN NOT NULL DEFAULT TRUE,
  PRIMARY KEY(userid, email_address)
);

CREATE TABLE user_profile (
  userid INT NOT NULL REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE
  -- all the profile type columns
);
In practice I actually use the username as the primary key for all the user tables as I dislike surrogate keys and user_name fits the bill nicely. I push first/last name out to "people" as I often need to have records for non-users. Email isn't left in the users field as people often have alternate emails, some might be "no-mail" if its just for accepting mail requests from that address, etc. Or sometimes instead of "mail_to" I'll use 'main_addr' and its the only one that recieves mail "official" mail from the site.

The users table is basically hit on every single webpage request for authentication. Therefore its a big win if the DBMS decides to cache it. Keeping the table narrow helps to encourage that and helps to ensure that keeping it cached doesn't push too many other important tabless out of the cache.
User avatar
AnarKy
Forum Contributor
Posts: 119
Joined: Tue Nov 02, 2004 1:49 am
Location: South Africa

normalized

Post by AnarKy »

So long as the tables are normalized,
it should not matter. 3rd normal form is good enough
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Re: normalized

Post by nielsene »

AnarKy wrote:So long as the tables are normalized,
it should not matter. 3rd normal form is good enough
I disagree. First I tend to go for full 5NF, migrating to 6NF as I get enough tools to generate all the constraints/views automatically.

Second, first you normalize, then you tune. When tuning I do NOT de-normalize. However I will split a table (which doesn't break normalization).

All databases have a concept or "Work Memory" or "Sort Memory" etc. If the tables involved in a join fit into the "Work/Sort" memory (across all simultaneous queries) then you "win" the DB doesn't have to swap to disc. That advocates keeping the row "width" low. Yes don't want the preformence overhead of having to do three extra joins every query (even if hiding the decomposition behind a view), but if you don't commonly need those extra columns, separate them out.
McGruff
DevNet Master
Posts: 2893
Joined: Thu Jan 30, 2003 8:26 pm
Location: Glasgow, Scotland

Post by McGruff »

You might be interested in Java Database Best Practices.
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

McGruff wrote:You might be interested in Java Database Best Practices.
I'd be very, very wary of any "Java Database Best Practices" the Java community as a whole doesn't tend to "understand" Relational Databases. (Mainly because of the different models -- Object versus Relational.) The Java community tends to blame the Relational Model for everything.

On a quick glance, that chapter makes at least three glaring mistakes from a Relational model view point: Primary Key!="special index", advoacates using NULLs, and it is strongly against 1:1.
User avatar
AnarKy
Forum Contributor
Posts: 119
Joined: Tue Nov 02, 2004 1:49 am
Location: South Africa

Re: normalized

Post by AnarKy »

nielsene wrote:
AnarKy wrote:So long as the tables are normalized,
it should not matter. 3rd normal form is good enough
I disagree. First I tend to go for full 5NF, migrating to 6NF as I get enough tools to generate all the constraints/views automatically.
I meant for small apps and inexperienced developers...
going above 3NF is usually too much for some people to handle.
AGISB
Forum Contributor
Posts: 422
Joined: Fri Jul 09, 2004 1:23 am

Post by AGISB »

If you go for 5NF wouldn't this rule out 1to1 relationships? I thought this actually is ruled out by 3Nf already.

As I am only using MySQL I am mostly using no 1 to 1 relationships. MySQL is pretty fast when it comes to use a lot of tables and data. I just try to use static tables as much as possible as this is a lot faster than dynamic and hd space is cheap.
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

AGISB wrote:If you go for 5NF wouldn't this rule out 1to1 relationships? I thought this actually is ruled out by 3Nf already.
No, nothing in normalzation talks about when to combine relations. Thus splitting an otherwise already normalized relation won't yield a situation where you are directed to re-combine it. You might have to further normalize the splits if you have non-key dependencies in the satellite after splitting -- but that's only possible if you didn't start from a 4NF starting point.
AGISB
Forum Contributor
Posts: 422
Joined: Fri Jul 09, 2004 1:23 am

Post by AGISB »

But a 1 to 1 relationship gives you redudant data and even if it is the primary key. And redudant data is talked about in Normalisation. At least how I would see it.
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

its not redundant data is the satellite table is maintained vai a foreing key update/delete constraint.
Post Reply