Page 1 of 1

Am I overloading the row?

Posted: Tue Sep 02, 2003 5:19 pm
by Aaron
Hi, I have a table with users stored in, the table is used by 2 sites - so users dont have to sign up on lotsa sites.

The problem comes when the second site needs more rows to store relavent information...I think Im overloading the row.

Code: Select all

uid  int(11)
   username  varchar(30)
   password  varchar(50)           
   email  varchar(100)           
   status  int(2)   
   site  varchar(150)           
   avatar  varchar(4)           
   sex  char(1)           
   sexuality  char(3)           
   likes  varchar(60)           
   dislikes  varchar(60)           
   habits  varchar(60)           
   tv  varchar(60)           
   book  varchar(60)           
   music  varchar(60)           
   hobbies  varchar(60)           
   msn  varchar(75)   
   bday  date 
   country  char(2)    
   sig  text
   tag  text   
   msn_hide  char(3)      
   email_hide  char(3)      
   people_voted  int(8)     
   respect_total  int(8)    
   babe  int(8)     
   minger  int(8)     
   regdate  int(11)    
   last_login  int(11)     
   login  int(8)    
   logout  int(8)     
   views  int(8)    
   ip  varchar(75)           
   host  varchar(100)           
   pcount  int(8)
You will note that many of the fields are small limited varchars or ints - really does it look like Im overloading a row? How far can I push a row?

Posted: Tue Sep 02, 2003 6:21 pm
by Stoker
there is no real limit on how many columns you can have
Normalization breaks anything apart that could exist in nore than one record, so bacially your sexuality column is likely a violation since it is a char(3), it could be a single byte instead (like tinyint), where the "lookup" is either hardcoded in your app (common with short lists that never change) or listed in another table. (Like your status column)

So the unusual aspects of your table def I would say is the column definitions / storage usage, e.g. why is ip a varchar(75) when it could be an unsigned int(11)? The column babe, does it need 8 digits? Avatar is a char(4), not sure if I understand what would be stored there, likely it should point to an id of a list of avatars.

In many cases it makes sense to break it into two tables, put all the data that is most frequently requested in its own table, add indexes that match common queries and leave the less-used data in its own table..

Posted: Tue Sep 02, 2003 10:15 pm
by nielsene
Yeah I agree with stoker about splitting to multiple tables. Probalby something like users_common, users_site1, users_site2 at least. Where the primary key of users_site[1|2] are foreign key references back to users_common. If your DBMS system supports views than you can even create views that handle the natural join of common and site[1|2]

I don't agree with stoker completely about changing things like status/sexuality to lookup tables. I used to take the approach -- most of my tables ended up being long lists of integers references other tables. This made doing everything very cumbersome AND wasn't required by normalization, in the first place.

These are places with it would make sense to use either domain/column constraints or a user defined type if you DBMS system supports it. If your database doesn't support these concepts ou can simulate it using a single column table that holds the legal values, in their "natural" representation -- no reason to have an auto increment. Then you use a foreign key reference to this lookup table, but are able to use the meaninngful "natural key" instead of a pure numeric key AND still be in compliance with normalisation theory.

This approach will mean that database will take more space, but lookups will be significantly faster and inserts/deletes/updates are no slower than an "artificial" key based system as I think stoker was mentioning.

Posted: Tue Sep 02, 2003 11:22 pm
by Stoker
hehe, different souls different meanings :)

To me storing a long identical string multiple times in the same column is not "normal" and afaik the first normal form eliminates it:

A. Eliminate repeating groups in individual tables.
B. Create a separate table for each set of related data.
C. Identify each set of related data with a primary key.

While quering flat storage is faster than multiple joins, it sort of takes away the concept of data relation and the reason of using an RDBMS, and the storage usage is often significant when many fields are 10 or 20 or 30 times more spacewasting than they need to.. I believe that the use of app/hardcoded static lists along with sensible queries (instead of one complex query make 2, a query to find id's first and one to fetch the few records with their joins afterwards is often useful) and indexes fit to each query can increase speed so much that the storage-savings outweighs the speedloss..

One sample of a "bad app" PHPShop (v 0.6.2), it uses a varchar32 with md5-sums for user_id throughout (instead of a numeric), this application is extremely ineficient at many things in many places, unfortunately I didnt know or see this when I started using it, I have added about 40 something indexes (There was none, some even without a PK).. I just did a quarterly purge of old orderdata for a client of mine, the purging and table optimization took about 8 minutes (on a dual P3/1.13Ghz 1Gb scsi server), the total database size was reduced from 47 to 13MB, about 19000 old orders removed..

Posted: Wed Sep 03, 2003 9:06 am
by nielsene
Well "repeating groups" refers to array like attributes in a row, not repeated values in a column. Some people also consider attributes like "favorite1", "favorite2", etc repeating groups. Technically according to relational theory they are no (they are poor design, but thats not identical to not normalized.) Again, technically speaking all SQL tables are in 1NF. The classic repeating group problem comes from IMS/hierarhic databases and isn't even possible in its original form in SQL databases. A modern day example would be something like storing a comma delimited list of favorites in a field AND always operating on those favorites on a individual item level. (If you only operate on groups of favorites, the list would be considered atomic and therefore not a repeating group.)

I'll agree that the decision as to when to use numeric primary keys versus "natural" keys is often a tough one. However, I've tried both extremes (all numerics and all "naturals"(where possible)) and I find that the all "natural" extreme is much better than the all numerics. Of course a compromise is best. I just get antsy when I see people pushing the former (especially trying to blame in on normalization theory.)

For instance, I would much rather use username than userid as the primary key of my users table -- and not even have a userid. Usernames are already unique and therefore identify the row in question. When used in other tables I would rather see "nielsene" than "2412" for instance. Multiple appearance of "nielsene" is no more and no less redundant that "2412". It is less efficient in terms of storage space, but more efficient in terms of lookup time. The foreign key constraints protects again typos and allows for the same cascaded updates in either case.