common approaches on record history/revisions
Moderator: General Moderators
common approaches on record history/revisions
Hi,
I'm sure a lot of you have at some point had to implement some sort of record revisions in your project. I'm interested in what approaches did you take?
By record revisions I mean some sort of history, where you can view data for every change in a record, when and who did it and what did he do. So, basically this means no deleting or updating the tables, just inserting new records with new values. The last record in a revision-thread is the "right one" at a given moment.
I hope you know what I mean, I'd be glad if you'd share your experiences and resources.
best regards
I'm sure a lot of you have at some point had to implement some sort of record revisions in your project. I'm interested in what approaches did you take?
By record revisions I mean some sort of history, where you can view data for every change in a record, when and who did it and what did he do. So, basically this means no deleting or updating the tables, just inserting new records with new values. The last record in a revision-thread is the "right one" at a given moment.
I hope you know what I mean, I'd be glad if you'd share your experiences and resources.
best regards
Yes, I completely forgot to mention that.feyd wrote:I'm going to guess you're not talking files but database records and such.
Could you elaborate on that or provide links and stuff? I'm very much interested in this.feyd wrote:The stuff I've designed has either been similar to Subversion using differentials or used auditing.
In one project I used the database itself to do the revision work. Triggers took care of the updates and such. But it was on PostgreSQL and now I'm developing on mysql, where I don't have triggers, so I must implement this with php application logic.
Thanks and best regards
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
I don't recall where I read the materials.. it's probably from a wide range of sources.
Take a look at Delta Encoding and Xdelta (Subversion's diff algorithm.)
The following may be informative too:
http://en.wikipedia.org/wiki/Database_audit
http://en.wikipedia.org/wiki/Audit_%28disambiguation%29 may lead to additional information.
Take a look at Delta Encoding and Xdelta (Subversion's diff algorithm.)
The following may be informative too:
http://en.wikipedia.org/wiki/Database_audit
http://en.wikipedia.org/wiki/Audit_%28disambiguation%29 may lead to additional information.
- Kieran Huggins
- DevNet Master
- Posts: 3635
- Joined: Wed Dec 06, 2006 4:14 pm
- Location: Toronto, Canada
- Contact:
I'm implementing this feature now - my approach is to write every update to the database, along with the timestamp. Then when I want to select a record I ask for the last one that was modified before "now". This way I can "wayback" my data by making "now" some time in the past.
Deleted (not simply updated) records have a timestamp written to the deleted column so I can ignore those. It seems to work so far, and it also supports "future" publishing. (if that's even a term)
Another option if you're maintaining a tree of documents is to use a third party app like SVN as your storage engine. It's not lightning quick, but with a local cache of the HEAD your app shouldn't suffer. Plus, it's likely less work to implement and you'll be able to browse the repository with many 3rd party tools.
Deleted (not simply updated) records have a timestamp written to the deleted column so I can ignore those. It seems to work so far, and it also supports "future" publishing. (if that's even a term)
Another option if you're maintaining a tree of documents is to use a third party app like SVN as your storage engine. It's not lightning quick, but with a local cache of the HEAD your app shouldn't suffer. Plus, it's likely less work to implement and you'll be able to browse the repository with many 3rd party tools.
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
That's an interesting approach. Might have to borrow that.Kieran Huggins wrote:I'm implementing this feature now - my approach is to write every update to the database, along with the timestamp. Then when I want to select a record I ask for the last one that was modified before "now". This way I can "wayback" my data by making "now" some time in the past.
Deleted (not simply updated) records have a timestamp written to the deleted column so I can ignore those. It seems to work so far, and it also supports "future" publishing. (if that's even a term)
- Kieran Huggins
- DevNet Master
- Posts: 3635
- Joined: Wed Dec 06, 2006 4:14 pm
- Location: Toronto, Canada
- Contact:
Hi, thanks for the responses.
I decided not to go with a delta (diff) encoding approach, because it seems (to a layman) much more suitable for a stream of data than for a discrete set of small chunks of data (database fields). It would also complicate the database design and the developer would not be able to easily retrieve a certain record for any given revision; each revision would have to be calculated from the base one. This would also make searching a PITA.
So, I guess I'm going with replicating full records, as a few of you have suggested.
I see a few issues here. (Here below I speak of a record as a "each database row that is the last and the only valid revision")
1) primary key; I want to be able to retrieve a record by a unique ID, no matter the revision it's in. This means I'd have to either have another table that would hold primary keys (and use foreign-key relations inside the revision table) or implement my own auto-incrementing for these keys.
2) number of rows in the "revision" table would grow much faster than the amount of valid records, given that users would often update/delete records (which is very much expected in my project).
3) provided that I want to keep track of revisions, I would have to introduce a revision number. This would have to be some sort of auto incremental, but local to a certain record. Not that it's hard, but it's (unneccessary?) complicating because of race-conditions; transactions and table locking would have to be involved all over, overall decreasing the maintainability and performance of the system.
What are your opinions on these issues?
I hope I was clear enough, I know my english isn't brilliant.
I decided not to go with a delta (diff) encoding approach, because it seems (to a layman) much more suitable for a stream of data than for a discrete set of small chunks of data (database fields). It would also complicate the database design and the developer would not be able to easily retrieve a certain record for any given revision; each revision would have to be calculated from the base one. This would also make searching a PITA.
So, I guess I'm going with replicating full records, as a few of you have suggested.
I see a few issues here. (Here below I speak of a record as a "each database row that is the last and the only valid revision")
1) primary key; I want to be able to retrieve a record by a unique ID, no matter the revision it's in. This means I'd have to either have another table that would hold primary keys (and use foreign-key relations inside the revision table) or implement my own auto-incrementing for these keys.
2) number of rows in the "revision" table would grow much faster than the amount of valid records, given that users would often update/delete records (which is very much expected in my project).
3) provided that I want to keep track of revisions, I would have to introduce a revision number. This would have to be some sort of auto incremental, but local to a certain record. Not that it's hard, but it's (unneccessary?) complicating because of race-conditions; transactions and table locking would have to be involved all over, overall decreasing the maintainability and performance of the system.
What are your opinions on these issues?
I hope I was clear enough, I know my english isn't brilliant.