Page 1 of 1

data with versions.

Posted: Wed Jul 14, 2004 1:26 pm
by alanchinese
i am currently designing a relational database based on the non-relational database schema. i had tried to look for some books, i.e. database systems, or database product trainings, found about nothing to help solving my problems.
the following has two parts. i listed the requirement of the design and some solutions i have had in mind. i need some comments and suggestions for better solutions.
Part A: the customer scenario:
a) a customer has its data stored in a customer table.
b) this piece data is valid within a period, including the current access datetime.
c) a user proposes some changes of this piece of data with an effective date.
d) the outdated customer data needs to be saved as customer history and some program can look it up.
solutions:
1) create 3 customer tables: cust-hist, customer, cust-new.
pros: easy to understand, easy to select (for customer only), small table.
cons: 3 tables with the same design which means need to make 3 changes,
need to select 3 tables, need to copy the data from one to another.
2) create one customer table: customer.
pros: easy to maintain. it's neat to have this design since other data objects has similar design as customer.
cons: slower to select (need to compare the effective date and terminated date), a lot larger files.

Part B: the manifest scenario:
a) a master manifest data table stores attributes that are the same everyday, again, it has manifest-hist, manifest, and manifest-new.
b) users can assign various operations on each date on a particular manifest.
c) for some reasons each date on each manifest there would be more than one versions of data and the users can choose what to use in actual operation.
solutions:
i have only one solution:
create a master manifest file with similar design as customer; create a manifest-date-version file for each date, each manifest, and each version.
pros: this is the only solution i have.
cons: the manifest-date-version table size is huge! and selection takes quite long.
i am looking for a better solution for this scenario.

hope to get some feedbacks on these complicate problems, and would you please recommand me some good books about actually database design (other than those theories or database product trainings)? and where to download some sample database (like the movie database, or employee database demo, etc.)
thankx a lot.
from alan.

Posted: Wed Jul 14, 2004 2:54 pm
by lostboy
You need to evaluate what kind of data will change. Take your customer data, let's assume the following fields for the sake of arguement

cust_id
first name
last name
address
city
state
zip
country
phone
fax
email

Now, most cases not all the data will change, customers don't often change names for example. If we were to break the table down into a more object oriented approach we can create a person object, an address object and a contact object, like this:

Code: Select all

person                  address                  contact
cust_id                  addr_id                    contact_id
first name             address                    phone
last name              city                          fax
addr_id                 state                       email
contact_id             zip                          date_entered
date_entered        country                    last_updated_id
last_updated_id     last_updated_id
Now when a an address change occurs you only affect the address object. The last_updated_id refers to the changes table below. Your need to keep the history presents a more interesting problem. A logical outgrowth of this is to add a table, called for example, 'changes'

changes
change_id
user_id
date_change_entered
object_id
old value
new value

Now you have a seried of clean tables, other than the 'change' table and it doesn't slow the application much, even though you need, in mysql anyway, more queries to complete the operation. You avoid the cardinal sin of data duplication and space wasted by duplication of that data. Plus the object oriented approach gives you a little more flexibilty. Say a man liviing at an address becomes a customer, you enter his data as normal. Should his wife become a customer, you can then link the same address record to the new customer record saving even more space. A simple check for duplicates before the data is entered allows this flexibility.

hth