database normalization

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

database normalization

Post by s.dot »

I've been hearing a lot about this. On most of my databases on my sites I use a userid to identify users.

However, I'm creating a new site, where there will be lots of relational tables. Would it be a bad idea to use "username" as the relational field in tables instead of a "userid".

If I use the username, I can avoid lots of queries looking up the username for that userid.

Thoughts?
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
veridicus
Forum Commoner
Posts: 86
Joined: Fri Feb 23, 2007 9:16 am

Your users

Post by veridicus »

If you need the user name very often and it'll prevent many joins, then I would use username. Otherwise I'd use id. For example, for a forum I would probably use user name. My applications tend to not need user name often, so I go with user id.

Normalization is always a trade-off. I only avoid joins for performance reasons. For readability, maintenance, and other reaons it's generally best to go with ids.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

I honestly feel since usernames will be a big deal in this application, that using usernames will be best. On one of my forums I built, I use userid, and it's a pain to get all the usernames for those ids.

Of course, each user will still have a unique userid, and a unique username. So I can use the userid where I need to (such as storing files /uploads/12345/file.jpg) and the username where i need to (ie - view-photos.php?user=jimbob)
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Are users allowed to change their name (directly or otherwise)? If so, I would suggest against using names as bookmarks will immediately break if they are altered (unless you can future-proof them somehow.)
User avatar
jayshields
DevNet Resident
Posts: 1912
Joined: Mon Aug 22, 2005 12:11 pm
Location: Leeds/Manchester, England

Post by jayshields »

What feyd said. Aslong as usernames are unique and cannot be changed by the user then I would go with the username as the primary key.

On a side note, what has this got to do with normalisation? :?
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Yeah, the users won't be able to change names.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
mikeq
Forum Regular
Posts: 512
Joined: Fri May 03, 2002 3:33 am
Location: Edinburgh, Scotland

Post by mikeq »

where possible you should always use a "natural" primary key, i.e. one that appears within the data, rather than some arbitrary ID generated.

However, when doing joins Integers are the most efficient way for the database.

Database design is more art than science
User avatar
jolinar
Forum Commoner
Posts: 61
Joined: Tue May 24, 2005 4:24 pm
Location: in front of computer

Post by jolinar »

mikeq wrote:Database design is more art than science
I'd say it's a dark art more than anything else :lol:
User avatar
dude81
Forum Regular
Posts: 509
Joined: Mon Aug 29, 2005 6:26 am
Location: Pearls City

Post by dude81 »

I'd say use ID for the better becuase it reduces the database overhead.

I used to see this problem very often with improper database design I used to do when I'm not aware of normalization.

I assume this what it happens.
The each time you use varchar in the tables instead of an integer, the memory occupied by varchar is more than integer. So as you keep using this username varchars in every other table as a foreign key. So the memory occupied increases. Ultimately it results in database overhead and there can be possibly memory leakage problems. And thus your design gets affected if a hacker notices this leakage. :)
hrubos
Forum Contributor
Posts: 172
Joined: Sat Oct 07, 2006 3:44 pm

Post by hrubos »

I too study it in school and I realize that if we understand normalisation, we can design a perfect database.

If you have time, you should read it :
http://en.wikipedia.org/wiki/Database_normalization
User avatar
mikeq
Forum Regular
Posts: 512
Joined: Fri May 03, 2002 3:33 am
Location: Edinburgh, Scotland

Post by mikeq »

normalisation, we can design a perfect database
In the true sense of a perfect database, yes you can design to 3rd normal form or even Boyce/Codd normal form.

But in the real world perfect may not always be desireable. It very much depends on the application of the database, for instance when designing a database for reporting/data warehousing you would denormalise much of the database for performance reasons.

This where the art (and as joliner said sometimes a dark art :lol: ) comes into play.

In one project I was working on someone had designed an address database and each possible entity in an address had a table, so there were tables for house number/name, street, district, town, country, post code etc etc

Now when trying to return an address you were joining 5/6 tables. One of the reasons for normalisation is for updates/inserts/deletes etc. if you want to update a Town name for instance you only do it in one record in one table. But in reality how often does a town name change? not very. In this instance that level of normalisation was overkill, this is where the art (or common sense) comes into play.

I would have had 2 tables

one with "house number/name, post code" (in this house number/name, post code is a primary key)
one with "street, district, town, country" (in this street, district, town, country is a primary key).

Tables with a 1 - to - 1 join might be seen as breaking the 'rules', say you had a table with 60 fields in it. 10 are mandatory and 50 get filled in very occasionally but not very often and are not queried a lot, it would be perfectly acceptable to split this into 2 tables, one with 10 fields and one with 50 fields creating a 1 - to - 1 join.

This would be better for performance and memory usage of the database.
Post Reply