Database garbage collecting

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Database garbage collecting

Post by kaisellgren »

Hello,

There are plenty of situations in web applications where you need to store some data temporarily (e.g. for a week) and then it gets outdated and there's lots of it. This leads to a situation where you have enormous amount of obsolete data in your database. How do you clean it? For sure you could run crons on a daily basis to collect the garbage, but this isn't good for a web application that is supposed to be easy-to-use and work out-of-the-box as it requires the user to setup crons (if that's even possible).

I know about Event scheduler in MySQL, but I want something more portable (5.1.6 isn't widely established yet).
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Database garbage collecting

Post by onion2k »

In the last few major sites I've built I've just put a "Housekeeping" section in the admin system that has some reports, reset to defaults stuff, and buttons to clear caches and temporary data. In theory the admin users are supposed to go in once a month and click things, but I'm pretty sure they just do it if they think the site has slowed down.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Database garbage collecting

Post by Christopher »

I have done auto-cleanup as part of the code that adds those kinds of records. For example, in a database based shopping cart I might have the cart-item insert code run the cleanup code every 1000 records inserted. That is pretty easy to implement. But I only do it when I know there will be no client maintenance.
(#10850)
AntonioCS
Forum Newbie
Posts: 11
Joined: Tue Jan 15, 2008 12:31 pm

Re: Database garbage collecting

Post by AntonioCS »

In my last job I had to maintain a auctions site.
They had many crons jobs for stuff like that.

I think crons are really the way to go if you want something to be done automatically and have lot's of cleaning up to do. Might not be user-friendly but it is efficient.
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Re: Database garbage collecting

Post by kaisellgren »

I think I'm just gonna put a section into the admin area to erase all that garbage.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Database garbage collecting

Post by josh »

Having it in a control panel opens it own can of worms, like when the user's hosting kills the process because it was long running. A lot of standard hosting accounts come with crontab access now, its becoming pretty standard IMO. For ex. you could find one that is like $3 a month that has cron, sign up as an affiliate for them and put referal links in your software's FAQ. Most users IMO would be willing to switch if they saved $ and also got "ahead of the technology curve". There's also windows "equivalents" to cron.

I suppose you could also implement multiple strategies for garbage collecting, users with high traffic could use cron while casual users could use a random number generator to run the code every few requests.
User avatar
kaisellgren
DevNet Resident
Posts: 1675
Joined: Sat Jan 07, 2006 5:52 am
Location: Lahti, Finland.

Re: Database garbage collecting

Post by kaisellgren »

But is there a simple way for the end user to setup crons? No? He needs technical skills I think.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Database garbage collecting

Post by josh »

No a lot of hosts have GUIs, hostmonster does if I remember, they usually have a GUI as well as an advanced mode where they can paste the cron-tabs, so using the GUI they get nice drop-downs "daily", "weekly", etc...

Also you can use @daily syntax instead of 0 * * *
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Database garbage collecting

Post by onion2k »

josh wrote:Having it in a control panel opens it own can of worms, like when the user's hosting kills the process because it was long running.
Garbage collection shouldn't ever take very long. It's things like deleting old data and rebuilding indexes. Those can be optimised so MySQL (or whatever database you're using) does them when no other clients are connected (using the LOW PRIORITY keyword) or so that it doesn't rebuild indexes automatically (using the QUICK keyword). You can also lock the table for a large DELETE which helps, or even wrap it up in a transaction.

Furthermore, all housekeeping functions should be limited to only being allowed to be run in limited circumstances (eg only after 6pm if the site gets most of it's traffic during the day, only if the table size is greater than a million rows, only if the housekeeping hasn't been run for a month, etc). Those are pretty trivial to code in.

If your housekeeping functions are so long running that the host kills the process then something is probably wrong with the site that housekeeping isn't going to fix.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Database garbage collecting

Post by josh »

Shouldn't... but conditions aren't always ideal... I've had clients have that exact problem with Magento that's why I mentioned it, that is more of a cache rebuilding then then a simple record dropping script
Post Reply