Page 1 of 1

'Deleting' Data

Posted: Sat Mar 01, 2008 9:05 am
by Sekka
In the past, I most of the time never deleted data from a content managed website. I always had a field at the end of a record called 'deleted' that is either false or true. So, whenever I want to delete something, I simply set the field to true and then the record becomes hidden from the code (all my queries have a 'WHERE deleted = 0' in them).

Now, this seems the best thing to do as users will often delete data by accident, and I like to be able to recover the data for them.

I have been rethinking all my methods as of late as I have been rebuilding my framework and CMS software, and I want to rethink how I 'delete' data. Do I,
  • Simply run DELETE's and OPTMIZE's. This seems the most logical as you actually delete data.
  • Stick with my original method and have a 'deleted' field that is false or true. The second seems the best as all data is kept in case it is needed in the future for any unforeseen circumstance, but could this bloat a table and slow query performance when there are a lot of records?
  • Have a deleted table, where records are copied to when deleted, and then they are removed from the original table.
  • Or, do I do something completely different?
Thank you.

Re: 'Deleting' Data

Posted: Sat Mar 01, 2008 11:38 am
by alex.barylski
I would:

1) Implement some kind of versioning system for the primary content area(s) so you can undo changes.
2) Mark a record as archived and offer that as an option -- making delete more tedious.
3) Store archives in seperate table to prevent table bloat and performance issues.

Absolutely though, you want to implement some kind of version control -- it helps in keeping accoutability a focus of the system. When someone deletes a article by accident, they are far less likely to do so when they know they will be questioned on it later.

Also, as an additional precaution, you could always have 'editors' request for delete or update to be approved by a superior like web masters or admins.

Cheers :)

Re: 'Deleting' Data

Posted: Sat Mar 01, 2008 12:45 pm
by Sekka
1) Implement some kind of versioning system for the primary content area(s) so you can undo changes.
Version controlling is one thing I am definitely going to be looking at when I redo my systems. For example, a section within a website will have many 'instances' and a superior admin will choose which one is the 'live' instance.

But I was more taking about, for example, deleting the entire section itself. Do you recommend my 3rd option, moving the data to an archived table for this situation?

Re: 'Deleting' Data

Posted: Sat Mar 01, 2008 1:26 pm
by alex.barylski
But I was more taking about, for example, deleting the entire section itself. Do you recommend my 3rd option, moving the data to an archived table for this situation?
It depends. If your delivering web page content, than speed of impetus to your success, so the less records the primary table holds the faster you can find relevant content, in which case, yes, move archived pages into an 'archives' table.

I would create a cron script which queried the primary table and looked for pages flagged as 'archive' and then basically mirror the primary table but move 'flagged' pages over to the mirror/archived table. This way, you can search your archives table using the same query and code as the primary table, with a slight adjustment to the 'table' in the query.

Re: 'Deleting' Data

Posted: Sat Mar 01, 2008 1:45 pm
by Sekka
Sounds like the best option to me.

Thanks for the input!

Re: 'Deleting' Data

Posted: Sat Mar 01, 2008 8:07 pm
by Ambush Commander
Note that, if properly indexed, access time should not be an issue, and the simplicity of keeping everything in one table may outweigh the maintenance needs of mirroring.

Re: 'Deleting' Data

Posted: Sat Mar 01, 2008 8:24 pm
by Sekka
Ambush Commander wrote:Note that, if properly indexed, access time should not be an issue, and the simplicity of keeping everything in one table may outweigh the maintenance needs of mirroring.
Hrm. So, for example, I have a comment table.

When it is queried in the SQL, only the 'comment_id' and 'deleted' fields are ever used in the WHERE clause. If I mark these two as indexes, will I not have any problems in the future?

(I have never properly used indexing, so sorry if the answer is blatently obvious)

Re: 'Deleting' Data

Posted: Sat Mar 01, 2008 8:25 pm
by Ambush Commander
Shouldn't even be necessary: just do comment_id as a UNIQUE index.

Re: 'Deleting' Data

Posted: Sat Mar 01, 2008 8:31 pm
by Sekka
I already do that. Well, as long as I know it won't slow down when the table gets big.

Thanks!