Page 1 of 2
History & Versions of Articles/Content
Posted: Fri Jun 19, 2009 6:29 am
by kaisellgren
Hello there,
Every now and then I find web applications that have some sort of article history/versioning where you can select a previous version of the article/content and basically do a "restore" process. I'm planning on having one like that, but I have a few things in my mind and I would like to get your opinions.
1) Should an autosave add a new entry to the history list? I would like to implement an autosave feature, so, I wonder whether each autosave should add a new entry into the version history list? Or should I have a separate "list" (=latest autosave only) for autosaved articles? I think the ladder makes more sense. If the browser crashes (or wtvr) then the user could restore an autosave. The history list is for those articles that he saved by pressing the submit button. Agree?
2) How many versions? If the user keeps modifying the article (lets say he keeps adding new paragraphs) and then saves the article, how many different versions of the article should there be at max? Should I let the user decide (which was my initial thought)? What is usually a reasonable value? 10? Is versioning a bad thing when it comes to disc storage? If each article has 10 versions, it's basically 10x more space for the entire article content... now Imagine having 1 MB of article content, it would be 10 MB for the entire DB and some hosts have limits as the size of a DB. This leads to the question 3:
3) Do you ever remove versions? If article versions have stayed intact for enough long, will you remove those? If so, what's the duration you set? A user defined? Is one week a sufficient default value? If the time is not the indicator, then what is? And if the time is the indicator, then do you run Crons to do the cleanings or what?
4) How to determine a "change"? If the user adds one dot into the end of the article and presses the save button, will I still create a new history entry? How do you handle this? Do you just compare if the article has changed and if so you create a new entry?
That's plenty of questions I know, but if you got some opinions or thoughts I'm glad to hear them.

Re: History & Versions of Articles/Content
Posted: Fri Jun 19, 2009 7:49 am
by alex.barylski
Should an autosave add a new entry to the history list? I would like to implement an autosave feature, so, I wonder whether each autosave should add a new entry into the version history list? Or should I have a separate "list" (=latest autosave only) for autosaved articles? I think the ladder makes more sense. If the browser crashes (or wtvr) then the user could restore an autosave. The history list is for those articles that he saved by pressing the submit button. Agree?
Implement something like a 'restore point' feature that actually saves a snapshot of an article (easy) or alternatively you can actually calculate the diff and save that, auto-save in this case would only ever be invoked if there was a difference of text.
How many versions? If the user keeps modifying the article (lets say he keeps adding new paragraphs) and then saves the article, how many different versions of the article should there be at max? Should I let the user decide (which was my initial thought)? What is usually a reasonable value? 10? Is versioning a bad thing when it comes to disc storage? If each article has 10 versions, it's basically 10x more space for the entire article content... now Imagine having 1 MB of article content, it would be 10 MB for the entire DB and some hosts have limits as the size of a DB.
If you store a diff, you only store what actually changes between saves, so really it's not wasteful at all. On the other hand if you suppport a snapshot approach, then 10+ would become wasteful IMHO.
Do you ever remove versions? If article versions have stayed intact for enough long, will you remove those? If so, what's the duration you set? A user defined? Is one week a sufficient default value? If the time is not the indicator, then what is? And if the time is the indicator, then do you run Crons to do the cleanings or what?
Depends on the system and your users. Will they ever need to recover back to version one? Most CMS I've worked in that support snapshot versioning only supported a previous and current version, so as not to waste space. I think most users find an audit trail more helpful than unlimited backup, usually one is enough for most I think.
How to determine a "change"? If the user adds one dot into the end of the article and presses the save button, will I still create a new history entry? How do you handle this? Do you just compare if the article has changed and if so you create a new entry?
Using a diff, you only store whats actually changed. Using a snapshot would mean, you overwrote the previous content and ONLY added a 'period'. You might want to store the original buffer in javascript, and compare that to the updated contents, calculate a % change and if less than say 5% (or whatever value you deem legitimate) notify the user that "not a lot has changed and this will overwrite previous contents -- are you sure you want to continue?"
Cheers,
Alex
Re: History & Versions of Articles/Content
Posted: Fri Jun 19, 2009 9:33 am
by kaisellgren
Hmm. If I calculate a difference and save the difference into files - that would be quite acceptable, wouldn't it? People have a lot more filesystem space than database space and if I save only the difference, it would become quite efficient. Now I just need to find out a good way to create this "difference"... any ideas/links/resources for that?
Re: History & Versions of Articles/Content
Posted: Fri Jun 19, 2009 9:47 am
by alex.barylski
Hmm. If I calculate a difference and save the difference into files - that would be quite acceptable, wouldn't it?
They do? I think most shared hosting companies include the DB space in the HDD quota they supply. I would personally use the DB as a difference is typically quite small...a couple dozen bytes for trivial changes.
You could (if you can assume your users will run linux) I suppose run the diff & patch tools.
If you want to implement the algorithm yourself, you want to Google "longest common sub-sequence":
http://en.wikipedia.org/wiki/Longest_co ... ce_problem
It's not an easy algorithm to get right, so it might be worth finding existing PHP class or something...however all that I have used in times past have always been really buggy and versioning is something you don't want to be buggy, cause clients will be as mad at your for scrambling their data as they are had you just lost everything outright.
This is probably why most developers settle for a snapshot implementation, it's way easier to implement and almost fail safe.
Cheers,
Alex
Re: History & Versions of Articles/Content
Posted: Fri Jun 19, 2009 10:14 am
by kaisellgren
Cool, an algorithm is all I need

Re: History & Versions of Articles/Content
Posted: Fri Jun 19, 2009 10:41 am
by alex.barylski
Cool, an algorithm is all I need
Cool...when you get it implemented and bug free be sure to post it on here so I can use it.

Re: History & Versions of Articles/Content
Posted: Fri Jun 19, 2009 3:05 pm
by Darhazer
In user perspective:
1) Auto-save should be separate from the history, I mean an entry is created only when I hit save, and every autosave just updates the draft
2) Unlimited or better defined from the administrator. It can be defined not only as a number, but for example: keep version older no more than 1 year, compared with the last version
3) see 2
4) Well, as far as I know at download.bg change is:
Lets say I edit article. All edits I made are 1 change, unless:
* editor approve my version
Then next edit makes new version
* someone else changed the article
Then my version becomes an entry, his version - another one, and if I edit the article I'm creating a 3rd entry
Hope this is helpful.
Re: History & Versions of Articles/Content
Posted: Sat Jun 20, 2009 3:45 pm
by kaisellgren
Darhazer, how would you implement this "unlimited" revisions feature?
Re: History & Versions of Articles/Content
Posted: Sat Jun 20, 2009 8:32 pm
by Weirdan
PCSpectra wrote:Cool, an algorithm is all I need
Cool...when you get it implemented and bug free be sure to post it on here so I can use it.

I'd rather use something already tested by large community - unfortunately MediaWiki is released under GPL

Re: History & Versions of Articles/Content
Posted: Sun Jun 21, 2009 4:54 am
by kaisellgren
I managed to create a rather good diff script, but it consumes too much resources, maybe MediaWiki could give me ideas.
Re: History & Versions of Articles/Content
Posted: Sun Jun 21, 2009 7:02 am
by Bruno De Barros
Kai, I think that once you get the diff mechanism working, then changes aren't really that expensive in terms of disk space or even database space, which means that it'd be feasible to implement the unlimited history. From what Darhazer said, I think it'd be rather easy to implement this. You have your history, with different history item types (a Version, a Draft, or a version pending an editor's approval), then you have the date/time the version was made. This would allow you to keep as many records as you wanted to, and you could purge records older than "X" days, weeks, months, years, whatever, by simply making use of timestamps in the database (WHERE TIMESTAMP < timestamp_of_x_days_ago ).
Re: History & Versions of Articles/Content
Posted: Sun Jun 21, 2009 9:05 am
by kaisellgren
I know, that's why I am trying to get this done.

Re: History & Versions of Articles/Content
Posted: Sun Jun 21, 2009 1:55 pm
by Bruno De Barros
Are you planning on releasing it as Open Source? Just wondering, because I'm quite interested in this project, and would like to be able to contribute to it.
Re: History & Versions of Articles/Content
Posted: Sun Jun 21, 2009 2:45 pm
by kaisellgren
It's just one part of my project..
Anyway, I found out that MediaWiki saves the whole document each time.

Re: History & Versions of Articles/Content
Posted: Mon Jun 22, 2009 8:50 am
by alex.barylski