Page 1 of 1

common approaches on record history/revisions

Posted: Tue Jan 30, 2007 4:00 pm
by dbevfat
Hi,

I'm sure a lot of you have at some point had to implement some sort of record revisions in your project. I'm interested in what approaches did you take?

By record revisions I mean some sort of history, where you can view data for every change in a record, when and who did it and what did he do. So, basically this means no deleting or updating the tables, just inserting new records with new values. The last record in a revision-thread is the "right one" at a given moment.

I hope you know what I mean, I'd be glad if you'd share your experiences and resources.

best regards

Posted: Tue Jan 30, 2007 4:30 pm
by feyd
I'm going to guess you're not talking files but database records and such. It's rare that I've needed such a thing. The stuff I've designed has either been similar to Subversion using differentials or used auditing.

Posted: Tue Jan 30, 2007 4:33 pm
by Luke
feyd, do you know of any good resources to read about differentials/auditing? I was going to ask this same question. What search terms would I use to find information on differentials/auditing? Those terms come up with too many other things.

Posted: Tue Jan 30, 2007 4:46 pm
by dbevfat
feyd wrote:I'm going to guess you're not talking files but database records and such.
Yes, I completely forgot to mention that.
feyd wrote:The stuff I've designed has either been similar to Subversion using differentials or used auditing.
Could you elaborate on that or provide links and stuff? I'm very much interested in this.

In one project I used the database itself to do the revision work. Triggers took care of the updates and such. But it was on PostgreSQL and now I'm developing on mysql, where I don't have triggers, so I must implement this with php application logic.

Thanks and best regards

Posted: Tue Jan 30, 2007 5:01 pm
by feyd
I don't recall where I read the materials.. it's probably from a wide range of sources.

Take a look at Delta Encoding and Xdelta (Subversion's diff algorithm.)

The following may be informative too:
http://en.wikipedia.org/wiki/Database_audit
http://en.wikipedia.org/wiki/Audit_%28disambiguation%29 may lead to additional information.

Posted: Tue Jan 30, 2007 5:09 pm
by Kieran Huggins
I'm implementing this feature now - my approach is to write every update to the database, along with the timestamp. Then when I want to select a record I ask for the last one that was modified before "now". This way I can "wayback" my data by making "now" some time in the past.

Deleted (not simply updated) records have a timestamp written to the deleted column so I can ignore those. It seems to work so far, and it also supports "future" publishing. (if that's even a term)

Another option if you're maintaining a tree of documents is to use a third party app like SVN as your storage engine. It's not lightning quick, but with a local cache of the HEAD your app shouldn't suffer. Plus, it's likely less work to implement and you'll be able to browse the repository with many 3rd party tools.

Posted: Tue Jan 30, 2007 9:43 pm
by Christopher
I do something similar to what Kieran does, but update old records as no longer available when I insert the new record (which is marked available). That way I only have to select all the available records to get the current versions.

Posted: Wed Jan 31, 2007 4:31 am
by onion2k
Kieran Huggins wrote:I'm implementing this feature now - my approach is to write every update to the database, along with the timestamp. Then when I want to select a record I ask for the last one that was modified before "now". This way I can "wayback" my data by making "now" some time in the past.

Deleted (not simply updated) records have a timestamp written to the deleted column so I can ignore those. It seems to work so far, and it also supports "future" publishing. (if that's even a term)
That's an interesting approach. Might have to borrow that. :)

Posted: Wed Jan 31, 2007 8:14 pm
by Kieran Huggins
It's all yours, just have it back by midnight! ;-)

Posted: Tue Feb 06, 2007 4:05 am
by dbevfat
Hi, thanks for the responses.

I decided not to go with a delta (diff) encoding approach, because it seems (to a layman) much more suitable for a stream of data than for a discrete set of small chunks of data (database fields). It would also complicate the database design and the developer would not be able to easily retrieve a certain record for any given revision; each revision would have to be calculated from the base one. This would also make searching a PITA.

So, I guess I'm going with replicating full records, as a few of you have suggested.

I see a few issues here. (Here below I speak of a record as a "each database row that is the last and the only valid revision")

1) primary key; I want to be able to retrieve a record by a unique ID, no matter the revision it's in. This means I'd have to either have another table that would hold primary keys (and use foreign-key relations inside the revision table) or implement my own auto-incrementing for these keys.

2) number of rows in the "revision" table would grow much faster than the amount of valid records, given that users would often update/delete records (which is very much expected in my project).

3) provided that I want to keep track of revisions, I would have to introduce a revision number. This would have to be some sort of auto incremental, but local to a certain record. Not that it's hard, but it's (unneccessary?) complicating because of race-conditions; transactions and table locking would have to be involved all over, overall decreasing the maintainability and performance of the system.

What are your opinions on these issues?

I hope I was clear enough, I know my english isn't brilliant. :)