Page 1 of 2
Writing a queuing system.
Posted: Tue Feb 07, 2006 3:21 pm
by onion2k
I've knocked together a nice app that makes pretty PDF posters (my own version of Rasterbator basically). The problem however is that it can be pretty intensive on the CPU .. so I want to make a queuing system with a nice AJAX "there are N people ahead of you in the queue" counter. Anyone done anything similar? I have a bunch of ideas for ways to tackle the problem, but if anyone has any experience of this sort of thing it'd be invaluable right now..
Posted: Tue Feb 07, 2006 4:51 pm
by Christopher
I've done queues like this in a database before. You end up polling which can put a lot of load on the server. I added items to the queue with a timestamp and then polled with something like "SELECT * FROM queue ORDER BY time_added LIMIT 1". When a user's entry comes up you process it and then delete the record from the queue. But you will need to implement some checks for stale/abandoned records and clashes/trashing.
Posted: Thu Feb 09, 2006 11:53 am
by BDKR
How about shared memory? As long as this just one server (as opposed to a web farm or cluster) it should work fine. Of course, you'll also need semaphores to make sure that only process at a time can access that area of memory.
Posted: Thu Feb 09, 2006 4:43 pm
by josh
I would say instead of ordering on the timestamp order on your primary key, and instead use the timestamp as a way of telling that the user is still there, example have javascript "ping" your script every 15 seconds to say basically "i'm still here", you'd update the timestamp and you would effectively ignore records that have a timestamp of over 30 seconds old (meaning those users abandoned the queue). I would also have this thing run on the command line and start it with linux, and just have it sitting on a loop waiting 5 seconds or so between hitting the database to check the queue. Once the processing is done save the reuslt somewhere in a tmp folder, update the queue table with a flag that says it is done and store the path of the file in another field. The javascript that is "pinging" your script every 15 seconds will see the flag that says it is done and take the appropriate action.
Posted: Thu Feb 09, 2006 5:20 pm
by Christopher
I would agree with jshpro2's logic. The algorithm is probably similar to the "users online" check that BBSs do. If it is high traffic I don't know if you want all those timestamp writes back to he database though (even if it's only every 15 seconds).
Posted: Thu Feb 09, 2006 5:30 pm
by josh
If the timestamp updates are a problem use a delayed low_priority update; the most delay you generally experience is a few seconds... and you are allowing a margine of error of ~ 15 seconds anyways due to latency
Posted: Thu Feb 09, 2006 6:33 pm
by Christopher
Yeah, I'd just rather have a system that only writes when there is a state change. And either way we are only dealing with abandoned items for which we have to deal with the case where it is abandoned but within the timeframe and abandoned but has timed out. The order of processing would take care if itself whether you used the record key value or a timestamp because both are sequential with respect to adding items to the queue. It seems like there are three states:
1. pending
2. processed
3. done (and maybe deleted)
A user adds a record that is set to 'pending'. When the converter process gets that that record it does the conversion and sets it to 'processed'. And finally when the user downloads the file it is set to 'done'. A separate process could scan the list for records that have been 'processed' but the timestamp is older than some specified timeframe -- and cleans up / deletes these abandoned records (which might add an 'abandoned' state.
One question is whether you want to delete records or leave them to have a history/recipt of the downloads.
Posted: Thu Feb 09, 2006 6:39 pm
by josh
From what I understand the processing should only be done if the user is sitting at the page that is telling them to wait. The only way to determin that is to have the user checking in on an interval telling the server they are still there.
Posted: Thu Feb 09, 2006 6:54 pm
by Christopher
If they are telling the server they are still there it is a write. If they are just polling to see if their job has been processed is it only a read.
Posted: Thu Feb 09, 2006 6:59 pm
by josh
It would perform the read each time they call home to the server, if their job is not done it will update the timestamp, so it either READS and WRITES, or just READS, it should be a fairly simple process as far as cost to the server, as long as your timestamp field is indexed it is a select of type "SIMPLE" as EXPLAIN will show in mysql. Also seeing as the job it is actually processing is CPU expensive a few extra inserts / selects wouldn't make a huge dent I wouldn't think, if it does just increase the timeout (at the cost of increasing the chance of processing a job when someone has "aborted")
Posted: Thu Feb 09, 2006 9:01 pm
by Christopher
I think my goal was to avoid having the polling part doing any writes. In MySQL for example, writes lock the whole table so lots of writes will slow all other queries.
The web side of this system is also the least dependable. I am assuming that there is a process is running on the server that is polling the database waiting for entries to be added to the queue (or processing entries in the queue). That process is pretty dependable and would only abort on some error. All sorts of problems can occur on the web side though.
It is kind of difficult to know how to optimize an unknown system though.
Posted: Thu Feb 09, 2006 10:05 pm
by josh
arborint wrote:writes lock the whole table so lots of writes will slow all other queries
Care to back that up with a link? Last time I checked only things like repairing / optimizing tables locked the table (aside from explicitly locking it)
Posted: Thu Feb 09, 2006 10:34 pm
by Christopher
Posted: Thu Feb 09, 2006 10:42 pm
by josh
That is the documentation on how to use table locking, nowhere does it say INSERTS / UPDATES lock the table, correct me if I am wrong
Posted: Thu Feb 09, 2006 11:23 pm
by feyd
Table locking is also disadvantageous under the following scenario:
- A client issues a SELECT that takes a long time to run.
- Another client then issues an UPDATE on the same table. This client waits until the SELECT is finished.
- Another client issues another SELECT statement on the same table. Because UPDATE has higher priority than SELECT, this SELECT waits for the UPDATE to finish, and for the first SELECT to finish.
Seems fairly straight forward from that.
Although I'm not entirely sure if auto-locking is happening in this particular case.. I guess it depends on how you set it up.
Although I would say, reading only, would be preferred over reading and writing.