Writing a queuing system.
Moderator: General Moderators
Writing a queuing system.
I've knocked together a nice app that makes pretty PDF posters (my own version of Rasterbator basically). The problem however is that it can be pretty intensive on the CPU .. so I want to make a queuing system with a nice AJAX "there are N people ahead of you in the queue" counter. Anyone done anything similar? I have a bunch of ideas for ways to tackle the problem, but if anyone has any experience of this sort of thing it'd be invaluable right now..
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
I've done queues like this in a database before. You end up polling which can put a lot of load on the server. I added items to the queue with a timestamp and then polled with something like "SELECT * FROM queue ORDER BY time_added LIMIT 1". When a user's entry comes up you process it and then delete the record from the queue. But you will need to implement some checks for stale/abandoned records and clashes/trashing.
(#10850)
I would say instead of ordering on the timestamp order on your primary key, and instead use the timestamp as a way of telling that the user is still there, example have javascript "ping" your script every 15 seconds to say basically "i'm still here", you'd update the timestamp and you would effectively ignore records that have a timestamp of over 30 seconds old (meaning those users abandoned the queue). I would also have this thing run on the command line and start it with linux, and just have it sitting on a loop waiting 5 seconds or so between hitting the database to check the queue. Once the processing is done save the reuslt somewhere in a tmp folder, update the queue table with a flag that says it is done and store the path of the file in another field. The javascript that is "pinging" your script every 15 seconds will see the flag that says it is done and take the appropriate action.
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Yeah, I'd just rather have a system that only writes when there is a state change. And either way we are only dealing with abandoned items for which we have to deal with the case where it is abandoned but within the timeframe and abandoned but has timed out. The order of processing would take care if itself whether you used the record key value or a timestamp because both are sequential with respect to adding items to the queue. It seems like there are three states:
1. pending
2. processed
3. done (and maybe deleted)
A user adds a record that is set to 'pending'. When the converter process gets that that record it does the conversion and sets it to 'processed'. And finally when the user downloads the file it is set to 'done'. A separate process could scan the list for records that have been 'processed' but the timestamp is older than some specified timeframe -- and cleans up / deletes these abandoned records (which might add an 'abandoned' state.
One question is whether you want to delete records or leave them to have a history/recipt of the downloads.
1. pending
2. processed
3. done (and maybe deleted)
A user adds a record that is set to 'pending'. When the converter process gets that that record it does the conversion and sets it to 'processed'. And finally when the user downloads the file it is set to 'done'. A separate process could scan the list for records that have been 'processed' but the timestamp is older than some specified timeframe -- and cleans up / deletes these abandoned records (which might add an 'abandoned' state.
One question is whether you want to delete records or leave them to have a history/recipt of the downloads.
(#10850)
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
It would perform the read each time they call home to the server, if their job is not done it will update the timestamp, so it either READS and WRITES, or just READS, it should be a fairly simple process as far as cost to the server, as long as your timestamp field is indexed it is a select of type "SIMPLE" as EXPLAIN will show in mysql. Also seeing as the job it is actually processing is CPU expensive a few extra inserts / selects wouldn't make a huge dent I wouldn't think, if it does just increase the timeout (at the cost of increasing the chance of processing a job when someone has "aborted")
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
I think my goal was to avoid having the polling part doing any writes. In MySQL for example, writes lock the whole table so lots of writes will slow all other queries.
The web side of this system is also the least dependable. I am assuming that there is a process is running on the server that is polling the database waiting for entries to be added to the queue (or processing entries in the queue). That process is pretty dependable and would only abort on some error. All sorts of problems can occur on the web side though.
It is kind of difficult to know how to optimize an unknown system though.
The web side of this system is also the least dependable. I am assuming that there is a process is running on the server that is polling the database waiting for entries to be added to the queue (or processing entries in the queue). That process is pretty dependable and would only abort on some error. All sorts of problems can occur on the web side though.
It is kind of difficult to know how to optimize an unknown system though.
(#10850)
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
Seems fairly straight forward from that.Table locking is also disadvantageous under the following scenario:
- A client issues a SELECT that takes a long time to run.
- Another client then issues an UPDATE on the same table. This client waits until the SELECT is finished.
- Another client issues another SELECT statement on the same table. Because UPDATE has higher priority than SELECT, this SELECT waits for the UPDATE to finish, and for the first SELECT to finish.
Although I'm not entirely sure if auto-locking is happening in this particular case.. I guess it depends on how you set it up.
Although I would say, reading only, would be preferred over reading and writing.