Page 1 of 1
database normalization
Posted: Sun Feb 25, 2007 4:06 pm
by s.dot
I've been hearing a lot about this. On most of my databases on my sites I use a userid to identify users.
However, I'm creating a new site, where there will be lots of relational tables. Would it be a bad idea to use "username" as the relational field in tables instead of a "userid".
If I use the username, I can avoid lots of queries looking up the username for that userid.
Thoughts?
Your users
Posted: Sun Feb 25, 2007 4:42 pm
by veridicus
If you need the user name very often and it'll prevent many joins, then I would use username. Otherwise I'd use id. For example, for a forum I would probably use user name. My applications tend to not need user name often, so I go with user id.
Normalization is always a trade-off. I only avoid joins for performance reasons. For readability, maintenance, and other reaons it's generally best to go with ids.
Posted: Sun Feb 25, 2007 4:57 pm
by s.dot
I honestly feel since usernames will be a big deal in this application, that using usernames will be best. On one of my forums I built, I use userid, and it's a pain to get all the usernames for those ids.
Of course, each user will still have a unique userid, and a unique username. So I can use the userid where I need to (such as storing files /uploads/12345/file.jpg) and the username where i need to (ie - view-photos.php?user=jimbob)
Posted: Sun Feb 25, 2007 6:50 pm
by feyd
Are users allowed to change their name (directly or otherwise)? If so, I would suggest against using names as bookmarks will immediately break if they are altered (unless you can future-proof them somehow.)
Posted: Sun Feb 25, 2007 7:01 pm
by jayshields
What feyd said. Aslong as usernames are unique and cannot be changed by the user then I would go with the username as the primary key.
On a side note, what has this got to do with normalisation?

Posted: Sun Feb 25, 2007 7:06 pm
by s.dot
Yeah, the users won't be able to change names.
Posted: Mon Feb 26, 2007 3:30 am
by mikeq
where possible you should always use a "natural" primary key, i.e. one that appears within the data, rather than some arbitrary ID generated.
However, when doing joins Integers are the most efficient way for the database.
Database design is more art than science
Posted: Wed Feb 28, 2007 8:57 am
by jolinar
mikeq wrote:Database design is more art than science
I'd say it's a dark art more than anything else

Posted: Wed Feb 28, 2007 9:37 am
by dude81
I'd say use ID for the better becuase it reduces the database overhead.
I used to see this problem very often with improper database design I used to do when I'm not aware of normalization.
I assume this what it happens.
The each time you use varchar in the tables instead of an integer, the memory occupied by varchar is more than integer. So as you keep using this username varchars in every other table as a foreign key. So the memory occupied increases. Ultimately it results in database overhead and there can be possibly memory leakage problems. And thus your design gets affected if a hacker notices this leakage.

Posted: Thu Mar 01, 2007 6:08 am
by hrubos
I too study it in school and I realize that if we understand normalisation, we can design a perfect database.
If you have time, you should read it :
http://en.wikipedia.org/wiki/Database_normalization
Posted: Thu Mar 01, 2007 8:50 am
by mikeq
normalisation, we can design a perfect database
In the true sense of a perfect database, yes you can design to 3rd normal form or even Boyce/Codd normal form.
But in the real world perfect may not always be desireable. It very much depends on the application of the database, for instance when designing a database for reporting/data warehousing you would denormalise much of the database for performance reasons.
This where the art (and as joliner said sometimes a dark art

) comes into play.
In one project I was working on someone had designed an address database and each possible entity in an address had a table, so there were tables for house number/name, street, district, town, country, post code etc etc
Now when trying to return an address you were joining 5/6 tables. One of the reasons for normalisation is for updates/inserts/deletes etc. if you want to update a Town name for instance you only do it in one record in one table. But in reality how often does a town name change? not very. In this instance that level of normalisation was overkill, this is where the art (or common sense) comes into play.
I would have had 2 tables
one with "house number/name, post code" (in this house number/name, post code is a primary key)
one with "street, district, town, country" (in this street, district, town, country is a primary key).
Tables with a 1 - to - 1 join might be seen as breaking the 'rules', say you had a table with 60 fields in it. 10 are mandatory and 50 get filled in very occasionally but not very often and are not queried a lot, it would be perfectly acceptable to split this into 2 tables, one with 10 fields and one with 50 fields creating a 1 - to - 1 join.
This would be better for performance and memory usage of the database.