Page 1 of 1

Database garbage collecting

Posted: Wed Nov 04, 2009 1:27 am
by kaisellgren
Hello,

There are plenty of situations in web applications where you need to store some data temporarily (e.g. for a week) and then it gets outdated and there's lots of it. This leads to a situation where you have enormous amount of obsolete data in your database. How do you clean it? For sure you could run crons on a daily basis to collect the garbage, but this isn't good for a web application that is supposed to be easy-to-use and work out-of-the-box as it requires the user to setup crons (if that's even possible).

I know about Event scheduler in MySQL, but I want something more portable (5.1.6 isn't widely established yet).

Re: Database garbage collecting

Posted: Wed Nov 04, 2009 2:11 am
by onion2k
In the last few major sites I've built I've just put a "Housekeeping" section in the admin system that has some reports, reset to defaults stuff, and buttons to clear caches and temporary data. In theory the admin users are supposed to go in once a month and click things, but I'm pretty sure they just do it if they think the site has slowed down.

Re: Database garbage collecting

Posted: Wed Nov 04, 2009 2:35 am
by Christopher
I have done auto-cleanup as part of the code that adds those kinds of records. For example, in a database based shopping cart I might have the cart-item insert code run the cleanup code every 1000 records inserted. That is pretty easy to implement. But I only do it when I know there will be no client maintenance.

Re: Database garbage collecting

Posted: Sat Nov 07, 2009 5:04 am
by AntonioCS
In my last job I had to maintain a auctions site.
They had many crons jobs for stuff like that.

I think crons are really the way to go if you want something to be done automatically and have lot's of cleaning up to do. Might not be user-friendly but it is efficient.

Re: Database garbage collecting

Posted: Sat Nov 07, 2009 5:16 am
by kaisellgren
I think I'm just gonna put a section into the admin area to erase all that garbage.

Re: Database garbage collecting

Posted: Sat Nov 07, 2009 10:28 pm
by josh
Having it in a control panel opens it own can of worms, like when the user's hosting kills the process because it was long running. A lot of standard hosting accounts come with crontab access now, its becoming pretty standard IMO. For ex. you could find one that is like $3 a month that has cron, sign up as an affiliate for them and put referal links in your software's FAQ. Most users IMO would be willing to switch if they saved $ and also got "ahead of the technology curve". There's also windows "equivalents" to cron.

I suppose you could also implement multiple strategies for garbage collecting, users with high traffic could use cron while casual users could use a random number generator to run the code every few requests.

Re: Database garbage collecting

Posted: Sun Nov 08, 2009 3:44 am
by kaisellgren
But is there a simple way for the end user to setup crons? No? He needs technical skills I think.

Re: Database garbage collecting

Posted: Sun Nov 08, 2009 5:17 am
by josh
No a lot of hosts have GUIs, hostmonster does if I remember, they usually have a GUI as well as an advanced mode where they can paste the cron-tabs, so using the GUI they get nice drop-downs "daily", "weekly", etc...

Also you can use @daily syntax instead of 0 * * *

Re: Database garbage collecting

Posted: Sun Nov 08, 2009 6:50 am
by onion2k
josh wrote:Having it in a control panel opens it own can of worms, like when the user's hosting kills the process because it was long running.
Garbage collection shouldn't ever take very long. It's things like deleting old data and rebuilding indexes. Those can be optimised so MySQL (or whatever database you're using) does them when no other clients are connected (using the LOW PRIORITY keyword) or so that it doesn't rebuild indexes automatically (using the QUICK keyword). You can also lock the table for a large DELETE which helps, or even wrap it up in a transaction.

Furthermore, all housekeeping functions should be limited to only being allowed to be run in limited circumstances (eg only after 6pm if the site gets most of it's traffic during the day, only if the table size is greater than a million rows, only if the housekeeping hasn't been run for a month, etc). Those are pretty trivial to code in.

If your housekeeping functions are so long running that the host kills the process then something is probably wrong with the site that housekeeping isn't going to fix.

Re: Database garbage collecting

Posted: Sun Nov 08, 2009 7:51 am
by josh
Shouldn't... but conditions aren't always ideal... I've had clients have that exact problem with Magento that's why I mentioned it, that is more of a cache rebuilding then then a simple record dropping script