Page 1 of 1
How to Maintain the Revesion History
Posted: Wed Dec 31, 2008 3:10 am
by mudgil.gaurav
Hi All
I am developing an application , and i need to implement the revision history mechanism so that any user can view the record for any particular content by whom this content modified last time and when and what was the content before modifying.
So according to my knowledge we can use a database table to keep the track of this.
I want to share this Idea and know your views what is the best way to achieve the same with good programming practice and resource utilization.
Thanks
Gaurav
Re: How to Maintain the Revesion History
Posted: Wed Dec 31, 2008 3:39 am
by onion2k
Moved to theory and design.
Yeah, it's easy enough. Just save a copy of the content to a second table with enough information to figure out what it is before you update any record. You could actually do it with a BEFORE UPDATE trigger if you're using MySQL 5.0.
Re: How to Maintain the Revesion History
Posted: Wed Dec 31, 2008 4:11 pm
by allspiritseve
The best way I've heard of (and keep in mind I've never implemented this myself) is to have a history table for each table you want to keep backups of. For every change to a row, you add a row to the history table containing ONLY the fields that have changed. The original table's row is updated, so the very latest version is kept in its entirety in that table, and you can revert to any version of the row by retrieving all the latest non-empty fields in the history table for that row. Does that make sense?
Re: How to Maintain the Revesion History
Posted: Sat Jan 03, 2009 8:56 am
by webaddict
allspiritseve wrote:The best way I've heard of (and keep in mind I've never implemented this myself) is to have a history table for each table you want to keep backups of. For every change to a row, you add a row to the history table containing ONLY the fields that have changed. The original table's row is updated, so the very latest version is kept in its entirety in that table, and you can revert to any version of the row by retrieving all the latest non-empty fields in the history table for that row. Does that make sense?
I'd consider that to be bad normalisation. The difference between old versions and the newest version is a newer date, no more, no less. Then how would you justify a new table? I don't think that one single column is a good reason for a new table. My suggestion would be to create two new columns in the normal table, something like "startdate" and "enddate", and fill that on each edit. The newest version is then the one without an enddate.
Put an index on startdate and enddate for performance, create a view if you don't want to type the "WHERE enddate IS NULL" each and every time. For creating records, you can suffice with an ON INSERT trigger and deletes aren't used any more. UPDATE is also just another INSERT, and if you want to, you might want to have an ON UPDATE trigger which gets the content of the old record (the one that should be updated), and just insert it as a new version.
That makes sense?
Re: How to Maintain the Revesion History
Posted: Sat Jan 03, 2009 3:12 pm
by kaszu
so that any user can view the record for any particular content by whom this content modified last time and when and what was the content before modifying
@webaddict Content also changes and those changes also needs to be saved, not only date when those changes were made.
I agree with onion. Having other table will benefit performance.
But should all fields be saved or only fields which have changed? I think it's performance VS size question and depends on your case and can't be answered that easily.
Re: How to Maintain the Revesion History
Posted: Sun Jan 04, 2009 9:58 am
by webaddict
kaszu wrote:@webaddict Content also changes and those changes also needs to be saved, not only date when those changes were made.
Well, uhm, duh. Maybe I haven't been clear at all, I'll try to explain again: each and every edit is a record. Each time something is edited, the new version will be saved. You can know it's the new version, because it doesn't have an enddate. Of course the changes have to be saved, but there is no reason your approach does something my approach can't. Worse yet, you'll violate the best practise of normalisation, taking the "history table" approach.
kaszu wrote:I agree with onion. Having other table will benefit performance. But should all fields be saved or only fields which have changed? I think it's performance VS size question and depends on your case and can't be answered that easily.
Figuring out what fields have changed etc. will also hit upon performance. Doing a simple SELECT on a table, based on columns that have indexes on them is peanuts for
any database. Also, I consider this to be premature optimalization. Performance is important, I'll give you that much, but violating best practises because there
might be issues if there are trillions of records is not the way I like to work. Besides, if you work with a view as I suggested, there is always oppertunity for refactoring in case the approach doesn't scale well enough.
Re: How to Maintain the Revesion History
Posted: Sun Jan 04, 2009 12:24 pm
by John Cartwright
You should only store the changes, not the entire row's data. Doing otherwise would be a violation of normalization principles because of data duplication.
Lets say your table has 20 fields, and you change a single field, there is no reason to duplicate the rest of the data that has not changed. Plus, it makes things far more dificult to track exactly what has changed.
Re: How to Maintain the Revesion History
Posted: Sun Jan 04, 2009 2:42 pm
by alex.barylski
should only store the changes, not the entire row's data. Doing otherwise would be a violation of normalization principles because of data duplication.
I haven't really followed this thread so pardon me if I misunderstand, but this caught my attention and I figured I would chime in.
I agree, to some extent, but...there are times where a complete snapshot does make sense...it's a primitive,trivial but effective UNDO and far easier to implement that a diff.
Obviously if your going to be storing hundreds of changes, then just the patch is good enough, but if your only storing a dozen or so snapshots, the duplication of data isn't that big a deal.
Personally I wouldn't really consider this a violation of normalization...although I agree it's duplication...it could also be called revision history.
Things like, storing the city name instead of a city ID in a table IMHO is what normalization is about. I dunno I just see that as slightly different than storing several near identical records.
Normalization I guess (for me) is more targeted at normalizing the schema than it is normalization the data, if that makes any sense.
Cheers,
Alex
Re: How to Maintain the Revesion History
Posted: Sun Jan 04, 2009 2:50 pm
by allspiritseve
PCSpectra wrote:Normalization I guess (for me) is more targeted at normalizing the schema than it is normalization the data, if that makes any sense.

Really? I'd say its more about normalizing the data... at least that's how I've always understood it.
Say, for example, you're storing a field "color" in a table... and each field has "red", "blue", "green", etc. The more item of each color you have, the more duplication you have. Later, if you want to go back and rename "red" to "crimson", you'd have to edit every row that has "red" as its color. What if somebody spelled it "rde"? You're SOL. However, if you move those colors into a separate table, where red has id 1, blue has id 2, green has id 3, then your original table can store a color_id rather than a color string. Now you're free to change any color to anything you want, once, and that change is reflected throughout. It's definitely a contrived example, but you get my point

Re: How to Maintain the Revesion History
Posted: Sun Jan 04, 2009 3:18 pm
by alex.barylski
Really? I'd say its more about normalizing the data... at least that's how I've always understood it.
Well...yes and no...by normalizing the schema you effectively normalize the data.
Secondly, normalization has several forms:
http://en.wikipedia.org/wiki/Database_normalization
According to 1NF, you could in theory have several records in a database table:
Code: Select all
articles:
pkid, versionid, completetext
And have multiple records like so:
Code: Select all
1, 1.0, This is the fuull text description
2, 1.1, This is the full text description (corrected)
That database table, although it would have
"some" duplication is still considered normalized according 1NF.
It's a primitive technqiue to implement a version history/snapshot, no dought, but it's effective for most article/content management systems (especially on shared servers) where you might not have access to a diff/patch tool and you were not interested in implementing LCS (Longest Common Subsequence) algorithm.
Cheers,
Alex
Re: How to Maintain the Revesion History
Posted: Thu Jan 08, 2009 3:52 am
by webaddict
jcart wrote:You should only store the changes, not the entire row's data. Doing otherwise would be a violation of normalization principles because of data duplication.
allspiritseve wrote:Really? I'd say its more about normalizing the data... at least that's how I've always understood it. It's definitely a contrived example, but you get my point

I do understand where you're coming from, really. I'd say normalizing is that you have data and try to create a flexible structure to work with that data. But seriously, repeated structures in a database is a smell (and a nasty one at that

).
Re: How to Maintain the Revesion History
Posted: Thu Jan 15, 2009 5:24 am
by onion2k
The problem with only storing the updated fields, if you use a structure that mirrors the database table rather than a 'one field change per row'* structure, is what to do with records that are allowed to be NULL or empty. For example, if you have a field that can be NULL, and in one record it has a value of 100 and the next it has a value of NULL, does that mean it's not changed or does that mean it's changed to be NULL? You'd need to record which fields change as well as the changes which is another layer of complexity. Recording a snapshot of the data at the time of the update might be denormalised but it's a hell of a lot simpler.
* EG a table like "Record Id | Field Name | New Value | Author Id | Update Timestamp".