Page 1 of 1

How to efficiently manage multiple threads with PHP/Apache

Posted: Fri Jun 01, 2007 12:22 pm
by the_drizzle
I'm currently writing various aspects of a web application that I can't really release the details of (NDA) :(.

Anyway, let me set the scene. Say I want to write an Ajax-based web front-end that changes regularly based on business logic running on the server. This is usually accomplished by polling the server on some interval, using Ajax. Google does this with it's iGoogle widgets. Unfortunately though, in relative terms, we want a *much* shorter polling interval than iGoogle's.

So I figured I'd write a generic Ajax/process management thingy. A client - let's call it Bob - would send an Ajax request to the server, which would log the sessionId and sleep the "thread". Whenever the server decided to communicate with Bob, it would stuff whatever it needed into Bob's session and wake Bob's sleeping server thread. Bob's thread would return the Ajax request to him. The Ajax response would be parsed and Bob's display would be changed appropriately. Bob would then send an Ajax request exactly the same as the first one, and that would be put to sleep, and then ... and the process would repeat.

Hopefully, my issue is somewhat apparent by this point. I know that when 50, 000 users are logged into the site and are all communicating with the server, Apache is essentially running some kind of thread for each request. How might I go about manipulating these threads so I can sleep, wake, or signal them? And if there are limitations, what's the best way to go about doing what I want to do. I'm not entirely sure if this is the best place for this post. I placed my question here because I figure this could expand into interesting directions i.e. blocking a thread that needs to hear back from an external process that is managed by the PHP code in Apache. Or maybe theres just some simple way to manage threads .

I'll appreciate any advice on this matter.

Graham

Posted: Fri Jun 01, 2007 1:30 pm
by feyd
You generally can't. PHP does not have thread level control, not to mention there's a finite limit to how many threads any given process can have open. I think the only thing you can look for (outside of massive server farms) is using a socket based approach.

Posted: Fri Jun 01, 2007 1:56 pm
by the_drizzle
Yeah, I figured PHP wouldn't support anything like that.

I was actually just looking into the socket option right now. I've only been programming in PHP for 4 weeks, so forgive me if I have to ask a simple question. What would I have to do to block when listening through a socket?

Posted: Fri Jun 01, 2007 2:09 pm
by the_drizzle
Heh, sorry, it's painfully simple. I've got sockets figured out now.

Posted: Mon Jun 11, 2007 5:28 pm
by BDKR
the_drizzle wrote:Heh, sorry, it's painfully simple. I've got sockets figured out now.
Yeah bro, you may have sockets figured, but that's not the tack to take in an AJAX front end.

1) You need to manage requests via a queue. There are couple of good JS queue implementations out there.

2) Javascript has a timer object that can take care of the polling interval for you. I used just such a thing while implementing a near real time
systems monitoring AJAX front end. The nice thing is that you can have multiple timers on a single page all working independently and capable of being started or stopped independent of one another.

Another thing to keep in mind is that Apache isn't really running threads proper, but forked child processes instead. Eventsill, this may ultimately depend on the version and configuration you are using. However, it's probably best to not even worry or consider how Apache is handling it's processes and worry instead about how you are going to manage state information between regular synchronous and asynchronous AJAX calls.

I've thought about this kind of traffic overhead myself and I've toyed with the idea of seperate server(s) to handle AJAX calls only. This would of course imply that sessions management is going to have to be network wide. :o LOL!!!

Cheers,
BDKR

Posted: Mon Jun 11, 2007 6:18 pm
by Ollie Saunders
You can store sessions in a database.

Why are people always asking about threads? Do you know that they are going to help you? Or are they just going to complicate stuff and degrade the performance.

Posted: Mon Jun 11, 2007 7:59 pm
by BDKR
ole wrote: You can store sessions in a database.
Exactly! I was just leaving it open as a mental exercise for the OP to go through. :wink:
ole wrote: Why are people always asking about threads? Do you know that they are going to help you? Or are they just going to complicate stuff and degrade the performance.


In many ways, a busy AJAX front end could very well be considered multi-threaded. Just not in strictest sense of the term. Keeping that in mind, it's easy to see how someone that may not have yet had to deal with lots of this type of thing to naturally assume threads are the correct answer here. :D

Posted: Wed Jun 13, 2007 5:46 pm
by the_drizzle
BDKR wrote:
the_drizzle wrote:Heh, sorry, it's painfully simple. I've got sockets figured out now.
Yeah bro, you may have sockets figured, but that's not the tack to take in an AJAX front end.

1) You need to manage requests via a queue. There are couple of good JS queue implementations out there.

2) Javascript has a timer object that can take care of the polling interval for you. I used just such a thing while implementing a near real time
systems monitoring AJAX front end. The nice thing is that you can have multiple timers on a single page all working independently and capable of being started or stopped independent of one another.
You're certainly right about the sockets. There are better ways of doing things, but I'm just trying to write a demo architecture and what I actually use to block and signal threads for the moment doesn't really matter. It's painless to replace sockets with something else later on.

As for polling, I did consider the option (thanks for the tip though). If I poll, at each interval - say 10 seconds - I poll the server for things to do to the client. This method requires sending data across the web, starting an entire apache thread that figures out if there's data for the client by accessing the database, and then sending a response back to the client. If I want very fast response times on the client when the server says the client needs to change its display (i.e. poll time of 1 sec) and I have a lot of clients using the site simultaneously, then polling isn't much of an option. I need something much faster than polling. Something like sockets only churns up a thread when it needs to, essentially letting the server decide when the client needs to receive data.

Many websites don't depend on fast response times, so polling usually works wonderfully well. A poll time of 10 seconds is way more polls than needed most of the time. The reasons websites don't depend on fast response times is because the web development world is still very-much in a non-Ajax-oriented mindset. When you effectively throw Ajax into the mix, then having fast updates provides a lot more functionality. Unfortunately though, as it goes with all new technologies, most people have yet to use Ajax properly simply because the innovators have yet to blaze a trail. 37signals (http://www.37signals.com/) should certainly be commended for their effort though. These guys haven't pushed Ajax to it's limits quite yet but they're definitely moving in the right direction and they're way ahead of everyone else.

Posted: Wed Jun 13, 2007 11:55 pm
by BDKR
the_drizzle wrote: You're certainly right about the sockets. There are better ways of doing things, but I'm just trying to write a demo architecture and what I actually use to block and signal threads for the moment doesn't really matter. It's painless to replace sockets with something else later on.
OK. Fair enough. :D
the_drizzle wrote: As for polling, I did consider the option (thanks for the tip though). If I poll, at each interval - say 10 seconds - I poll the server for things to do to the client. This method requires sending data across the web, starting an entire apache thread that figures out if there's data for the client by accessing the database, and then sending a response back to the client.
Reading this, I'm not entirely sure you've got a good grip on it.

1) How much data do you really need to send from the client to check for change? Not much. An ID and perhaps a short and simple string at most (unless of course you insist on wrapping it in XML).

2) Apache doesn't allways start another child process or thread (based on version and configuration) at each request. It would be god awful slow at that rate. And besides, Apache isn't going to actually be doing the work itself here. :wink:

3) You don't allways have to access the database either. I can think of one or two potential scenarios for caching data (or data structures of some sort) pertinent to the user in particular. Pulling data from cache is allways faster then yanking it from a DB. And you can use another mechanism for maintaining the cache.

4) And likewise the response from the server back to the client: it doesn't have to be huge. If you are realy, REALY worried about performance....
a) Just send plain strings
b) User innerHTML instead of DOM where possible

That said, the high level algortithm is going to be the same whether you use Sockets or AJAX. Of course tho, the question of threads is moot here as none of the technologies mentioned thus far in this thread are thread capable. The closest you are going to get is to use Javascript with individual timers for various events and a queue (or some other client side mechanism) to make sure those events don't go stupid on you.
the_drizzle wrote: If I want very fast response times on the client when the server says the client needs to change its display (i.e. poll time of 1 sec) and I have a lot of clients using the site simultaneously, then polling isn't much of an option. I need something much faster than polling. Something like sockets only churns up a thread when it needs to, essentially letting the server decide when the client needs to receive data.
Once again, threads are a dead issue here unless you change platforms.

In terms of the potential performance of an AJAX front end, I wrote a systems monitor that can poll at an interval of 1 second with a response time of well under a half a second (timed by firebug). It's so fast that it runs a graph showing system load with a queue 30 entries deep (it's a very fast JS based graph too). It also has a second display with is essentially the output of ps -aux with only the 10 most cpu intensive apps at any one point. The server is fast enough to parse that output and generate a string that renders as a table in well under .5 seconds. It almost looks like the output of Top, but with color coding to show particularly CPU intensive processes.

Now if there are as many potential users as you say, have you considered the possibility of seperate dedicated AJAX servers? I'm not just pulling this idea out of my arse BTW. I've designed, built, and managed 2 seperate load balanced / fault tolerant clusters. The first being an LVS software cluster that was essentially a Shared Nothing architecture before the term was coined.

I'm saying all this to say that you have to think outside of the box sometimes. AJAX is still new enough that you are swimming into uncharted waters here. Applying a little creativity here can reap huge benefits. I had to! When I got the Turbo Cluster 6 documentation, they (Turbo Linux) had no documentation for running their system behind a firewall because they hadn't yet figured it out! I did it then gave them the answer.

Speed is dependent on you as an individual developer. If you insist on the band wagon and an alphabet soup of six layer deep libraries, your performance will be correspondingly poor.

Now if you insiste on threads, then perhaps you should look at Java. It's a language with native support for threads so perhaps you can use that in a browser.

Posted: Thu Jun 14, 2007 11:05 am
by the_drizzle
Oh sorry, I wasn't trying to pick a fight. I can assure you I think outside of the box (tm) ;). I post things that I'm uncertain about so I can listen to what people have to say, such as yourself. So of course, I highly appreciate your lengthy replies.
BDKR wrote: 3) You don't allways have to access the database either. I can think of one or two potential scenarios for caching data (or data structures of some sort) pertinent to the user in particular. Pulling data from cache is allways faster then yanking it from a DB. And you can use another mechanism for maintaining the cache.
You're absolutely right. I was just assuming persistence, so the data needs to be a in a database. But I'm going to heavily cache right in front of the database anyway so I'll effectively be caching the data like you suggested. The data will be in the database as much as any other data.
BDKR wrote:Once again, threads are a dead issue here unless you change platforms.
I'm really using "thread" management as opposed to thread management. I did a dirty thing in my original post and started talking about thread management in an unconventional way. Apache creates threads, and those threads are blocked for whatever reason. Then another, Apache, thread can signal the original thread to 'unblock' and continue processing. I effectively have the ability to sleep and wake threads. So even though PHP doesn't support threads directly, I'm using a bit of a hack to get basic thread functionality. Just to be sure of which one method was faster - because I wasn't sure at all - I wrote functionality similar to what we're discussing and compared it to polling while under a heavy load. Polling at 1 sec with 1000 users turned out to be a little bit slower than doing it the thread way.

I'll admit that this advantage is gained because I want fast response times on the client but I don't need the client to update very often. That means that only 1 in every 150 polls or so will return data. If I wanted fast response times on the client and I wanted the client to update often then polling would be much more viable. For instance, polling the server for updated low-res satellite images with weather patterns drawn on top would be an excellent way of doing things because your poll/"hit" ratio would be much lower.

Polling is still a hotly debated issue in hardware design and OS research for the same sorts of reasons. Do we always poll? But wait, that means we're hogging bandwidth on whatever channel we're using (i.e. the bus). So we should signal whatever device wants the data? But that can put too much load on the guy with the data. OK. So we make a separate guy who just deals with polling. That'll work but it's still too slow for some things. All right, so we have this "middle-man" tell listeners to go into polling mode whenever there seems to be a lot of incoming data for them. The middle-man can then take them off polling mode when they aren't receiving nearly as much stuff. So we'll have a middle-man that manages a polling/signaling architecture and the design will be different depending on what two devices are communicating. But, wait! Again! A new kind of bus is now available and we can go back to the polling thing. *something else*. Now we're back on the middle path. Now we're back to signaling. .... You'll see this kind of thing with hardware guys but not as much with traditionally used operating systems. It's when you get into weird, trial, OS design patterns that you only see in very special-purpose businesses and the OS/algorithm academic community that you find this kind of debate about OS design.

I also like the idea of having complete control over the client and when it gets updated. I know I can simulate "full control" through polling just as easily. But controlling priorities of client-bound messages on the server-side of things just seems nicer to me. (Yeah, I could manage priorities server-side and have clients poll into that but ... well now it's just a preference thing ;)).
BDKR wrote: Now if there are as many potential users as you say, have you considered the possibility of seperate dedicated AJAX servers? I'm not just pulling this idea out of my arse BTW. I've designed, built, and managed 2 seperate load balanced / fault tolerant clusters. The first being an LVS software cluster that was essentially a Shared Nothing architecture before the term was coined.
Haha, you are completely right again. I am building separate Ajax servers already! Glad to see we're on the same page ;).
BDKR wrote: Speed is dependent on you as an individual developer. If you insist on the band wagon and an alphabet soup of six layer deep libraries, your performance will be correspondingly poor.
Well, talking about the value of deep libraries over flat libraries in terms of communication, extensibility etc. is a whooole different discussion. In fact, major CS universities have literal cliques of people defined by their position on this issue. You find those hacker geniuses - the kind of people that heavily contributed to Gentoo at age 15, and a list of other open source projects - are flat-library supporters. Many of the intellectual, theoretically trained, individuals with heavy experience in working on large software projects with large teams are deep-library supporters. Most people lie in the middle though and are just confused by the entire thing. These people, in the two opposing library camps, tend to grow up and become the next line of Profs, so you can see this kind of cliquing at the Prof level as well.

Posted: Thu Jun 14, 2007 11:15 am
by the_drizzle
ole wrote:You can store sessions in a database.
I was looking for a way to change a client's session that the running thread doesn't really have access to. If I store sessions in the database and modify them there, then I need to make sure the client's session and the one in the database are always in-sync. Using a database, I'd just put whatever I wanted changed in the database and have the appropriate thread pick it up and shove it in its session at some appropriate point. That was definitely the backup plan ;).

Posted: Fri Jun 15, 2007 12:50 pm
by BDKR
the_drizzle wrote:Oh sorry, I wasn't trying to pick a fight. I can assure you I think outside of the box (tm) ;). I post things that I'm uncertain about so I can listen to what people have to say, such as yourself. So of course, I highly appreciate your lengthy replies.
BDKR wrote: 3) You don't allways have to access the database either. I can think of one or two potential scenarios for caching data (or data structures of some sort) pertinent to the user in particular. Pulling data from cache is allways faster then yanking it from a DB. And you can use another mechanism for maintaining the cache.
You're absolutely right. I was just assuming persistence, so the data needs to be a in a database. But I'm going to heavily cache right in front of the database anyway so I'll effectively be caching the data like you suggested. The data will be in the database as much as any other data.
BDKR wrote:Once again, threads are a dead issue here unless you change platforms.
I'm really using "thread" management as opposed to thread management. I did a dirty thing in my original post and started talking about thread management in an unconventional way. Apache creates threads, and those threads are blocked for whatever reason. Then another, Apache, thread can signal the original thread to 'unblock' and continue processing. I effectively have the ability to sleep and wake threads. So even though PHP doesn't support threads directly, I'm using a bit of a hack to get basic thread functionality. Just to be sure of which one method was faster - because I wasn't sure at all - I wrote functionality similar to what we're discussing and compared it to polling while under a heavy load. Polling at 1 sec with 1000 users turned out to be a little bit slower than doing it the thread way.

I'll admit that this advantage is gained because I want fast response times on the client but I don't need the client to update very often. That means that only 1 in every 150 polls or so will return data. If I wanted fast response times on the client and I wanted the client to update often then polling would be much more viable. For instance, polling the server for updated low-res satellite images with weather patterns drawn on top would be an excellent way of doing things because your poll/"hit" ratio would be much lower.

Polling is still a hotly debated issue in hardware design and OS research for the same sorts of reasons. Do we always poll? But wait, that means we're hogging bandwidth on whatever channel we're using (i.e. the bus). So we should signal whatever device wants the data? But that can put too much load on the guy with the data. OK. So we make a separate guy who just deals with polling. That'll work but it's still too slow for some things. All right, so we have this "middle-man" tell listeners to go into polling mode whenever there seems to be a lot of incoming data for them. The middle-man can then take them off polling mode when they aren't receiving nearly as much stuff. So we'll have a middle-man that manages a polling/signaling architecture and the design will be different depending on what two devices are communicating. But, wait! Again! A new kind of bus is now available and we can go back to the polling thing. *something else*. Now we're back on the middle path. Now we're back to signaling. .... You'll see this kind of thing with hardware guys but not as much with traditionally used operating systems. It's when you get into weird, trial, OS design patterns that you only see in very special-purpose businesses and the OS/algorithm academic community that you find this kind of debate about OS design.

I also like the idea of having complete control over the client and when it gets updated. I know I can simulate "full control" through polling just as easily. But controlling priorities of client-bound messages on the server-side of things just seems nicer to me. (Yeah, I could manage priorities server-side and have clients poll into that but ... well now it's just a preference thing ;)).
BDKR wrote: Now if there are as many potential users as you say, have you considered the possibility of seperate dedicated AJAX servers? I'm not just pulling this idea out of my arse BTW. I've designed, built, and managed 2 seperate load balanced / fault tolerant clusters. The first being an LVS software cluster that was essentially a Shared Nothing architecture before the term was coined.
Haha, you are completely right again. I am building separate Ajax servers already! Glad to see we're on the same page ;).
BDKR wrote: Speed is dependent on you as an individual developer. If you insist on the band wagon and an alphabet soup of six layer deep libraries, your performance will be correspondingly poor.
Well, talking about the value of deep libraries over flat libraries in terms of communication, extensibility etc. is a whooole different discussion. In fact, major CS universities have literal cliques of people defined by their position on this issue. You find those hacker geniuses - the kind of people that heavily contributed to Gentoo at age 15, and a list of other open source projects - are flat-library supporters. Many of the intellectual, theoretically trained, individuals with heavy experience in working on large software projects with large teams are deep-library supporters. Most people lie in the middle though and are just confused by the entire thing. These people, in the two opposing library camps, tend to grow up and become the next line of Profs, so you can see this kind of cliquing at the Prof level as well.
Sorry for the tone. Sometimes it is a little tough to not sound totally <span style='color:blue' title='I'm naughty, are you naughty?'>smurf</span> off while on the internet.

And I can now see that you do have your thinking cap on here. My apologies. :D

Since these are still very much uncharted waters, I'd still love to hear how this turns out. :wink:

Posted: Fri Jun 15, 2007 1:08 pm
by BDKR
the_drizzle wrote:
BDKR wrote: Speed is dependent on you as an individual developer. If you insist on the band wagon and an alphabet soup of six layer deep libraries, your performance will be correspondingly poor.
Well, talking about the value of deep libraries over flat libraries in terms of communication, extensibility etc. is a whooole different discussion. In fact, major CS universities have literal cliques of people defined by their position on this issue. You find those hacker geniuses - the kind of people that heavily contributed to Gentoo at age 15, and a list of other open source projects - are flat-library supporters. Many of the intellectual, theoretically trained, individuals with heavy experience in working on large software projects with large teams are deep-library supporters. Most people lie in the middle though and are just confused by the entire thing. These people, in the two opposing library camps, tend to grow up and become the next line of Profs, so you can see this kind of cliquing at the Prof level as well.
Yeah, this is something that I wanted to comment on. :D I personally believe that some of those "deep libraries" as you called them have their place, but the sad fact is that moderation in these areas (not being a deep lib zealot) makes your job search kinda tough these days in the PHP feild. Newer smaller companies are inhabited by these guys and they seem to be rather un-intelligently biased at times.

A good example is one interview where I stated that I primarily develop my own stuff with a harvested framework (one that I've built over time and continued to use and tune) that is partially OO and partially Procedural. Wow, that was meant with some pretty sour looks. Nevermid the fact that it's very much MVC (light on the model part in particular) and the foundation of much of it's functionality is based on an OO framework I wrote myself.

I guess there is no way of proving my capability of understanding some of these newer frameworks unless I've used them? I guess there is no way they can belive what I'm saying unless I speak the names of the various Web 2.0 frameworks with care and reverence?

I live what Joel Spolsky calls these guys: Architect Astronauts. LOL!!!!!

Cheers