database normalization
Moderator: General Moderators
database normalization
I've been hearing a lot about this. On most of my databases on my sites I use a userid to identify users.
However, I'm creating a new site, where there will be lots of relational tables. Would it be a bad idea to use "username" as the relational field in tables instead of a "userid".
If I use the username, I can avoid lots of queries looking up the username for that userid.
Thoughts?
However, I'm creating a new site, where there will be lots of relational tables. Would it be a bad idea to use "username" as the relational field in tables instead of a "userid".
If I use the username, I can avoid lots of queries looking up the username for that userid.
Thoughts?
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
Your users
If you need the user name very often and it'll prevent many joins, then I would use username. Otherwise I'd use id. For example, for a forum I would probably use user name. My applications tend to not need user name often, so I go with user id.
Normalization is always a trade-off. I only avoid joins for performance reasons. For readability, maintenance, and other reaons it's generally best to go with ids.
Normalization is always a trade-off. I only avoid joins for performance reasons. For readability, maintenance, and other reaons it's generally best to go with ids.
I honestly feel since usernames will be a big deal in this application, that using usernames will be best. On one of my forums I built, I use userid, and it's a pain to get all the usernames for those ids.
Of course, each user will still have a unique userid, and a unique username. So I can use the userid where I need to (such as storing files /uploads/12345/file.jpg) and the username where i need to (ie - view-photos.php?user=jimbob)
Of course, each user will still have a unique userid, and a unique username. So I can use the userid where I need to (such as storing files /uploads/12345/file.jpg) and the username where i need to (ie - view-photos.php?user=jimbob)
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
- jayshields
- DevNet Resident
- Posts: 1912
- Joined: Mon Aug 22, 2005 12:11 pm
- Location: Leeds/Manchester, England
Yeah, the users won't be able to change names.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
I'd say use ID for the better becuase it reduces the database overhead.
I used to see this problem very often with improper database design I used to do when I'm not aware of normalization.
I assume this what it happens.
The each time you use varchar in the tables instead of an integer, the memory occupied by varchar is more than integer. So as you keep using this username varchars in every other table as a foreign key. So the memory occupied increases. Ultimately it results in database overhead and there can be possibly memory leakage problems. And thus your design gets affected if a hacker notices this leakage.
I used to see this problem very often with improper database design I used to do when I'm not aware of normalization.
I assume this what it happens.
The each time you use varchar in the tables instead of an integer, the memory occupied by varchar is more than integer. So as you keep using this username varchars in every other table as a foreign key. So the memory occupied increases. Ultimately it results in database overhead and there can be possibly memory leakage problems. And thus your design gets affected if a hacker notices this leakage.
I too study it in school and I realize that if we understand normalisation, we can design a perfect database.
If you have time, you should read it :
http://en.wikipedia.org/wiki/Database_normalization
If you have time, you should read it :
http://en.wikipedia.org/wiki/Database_normalization
In the true sense of a perfect database, yes you can design to 3rd normal form or even Boyce/Codd normal form.normalisation, we can design a perfect database
But in the real world perfect may not always be desireable. It very much depends on the application of the database, for instance when designing a database for reporting/data warehousing you would denormalise much of the database for performance reasons.
This where the art (and as joliner said sometimes a dark art
In one project I was working on someone had designed an address database and each possible entity in an address had a table, so there were tables for house number/name, street, district, town, country, post code etc etc
Now when trying to return an address you were joining 5/6 tables. One of the reasons for normalisation is for updates/inserts/deletes etc. if you want to update a Town name for instance you only do it in one record in one table. But in reality how often does a town name change? not very. In this instance that level of normalisation was overkill, this is where the art (or common sense) comes into play.
I would have had 2 tables
one with "house number/name, post code" (in this house number/name, post code is a primary key)
one with "street, district, town, country" (in this street, district, town, country is a primary key).
Tables with a 1 - to - 1 join might be seen as breaking the 'rules', say you had a table with 60 fields in it. 10 are mandatory and 50 get filled in very occasionally but not very often and are not queried a lot, it would be perfectly acceptable to split this into 2 tables, one with 10 fields and one with 50 fields creating a 1 - to - 1 join.
This would be better for performance and memory usage of the database.