burrito wrote:I am currently working on a project that has projected numbers of ~10,000 hpd.
down the road they're thinking, and *hoping* that this number might increase to ~100,000 - 200,000 hpd.
I'm wondering if it's at all possible to set up some sort of load balancing environment between two web servers that are running w2k IIS 5.0 with php 4.2.xx.
please advise, and if this can be done and you know some valuable resources that you could point me to, I'd really appreciate it.
thx,
Burr
I hope you don't mind my posting on this topic. I'm a couple of days past the real action here.
I ultimately have to agree with feyd here and suggest the use of a hardware load balancing solution. The alternative is commercial software products like Turbo Cluster and OS (as in open source) solutions like Ultra Monkey and many others (
http://lcic.org/load_balancing.html).
Now let me qualify why I agree with feyd. If you are not that well versed in many of the smaller things that make a software solution go, so to speak, and you are not given much in the way of time to work with and develop your cluster, then the software solution could easily present some extremely challenging difficulties. Especially an OS solution. I am yet to see any software solution that doesn't provide documentation that leaves one doing some head scratching?
I have to say that the Turbo Cluster documentation is very good in some places. It even takes the time to explain many of the key features that are the foundation of the process in general. Like IP Aliasing as an example. But for a commercial solution, it was still a bit of a let down in other areas. As an example, it only provided example scenarios that required that the cluster managers either run without firewalls or carry the additional burden/load of playing the firewall role. They had no clear examples of how to run a cluster behind a firewall and didn't until I sent them a block diagram detailing what I did.
Documentation aside, the commercial solution has the burden of cost associated with it. In particular, the fact that a certain number of nodes is going to come at a cost. At the time we used Turbo Cluster ('01-'03), it was 2200.00 for 10 nodes. Then another 2200.00 for more growth potential. Admittedly, you should have an exceptionally powerful cluster at this point, but it is still less than desireable in comparison to...
Hardware loadbalancing can have the same documentation problems as the software stuff and some of the solutions aren't as flexible. In the company I work for now, my watered down recommendation ultimately ended up with us using a couple of Cisco LocalDirectors. Good in their own way, they lack the ability to notify admins of failures. Turbo Cluster at least sent an email.
Beyond that, once they are paid for, the cost of buying additional lincenses in the future is a dead issue. Just keep adding servers as the need presents itself.
So, I say use an OS software solution if you allready know much of the foundation for it or you have the time to spend learning it. Otherwise, use a hardware solution. However, a software solution has the huge advantage of running on machines of whatever strength you specify. They are much easier to upgrade (speaking of hardware) than a proprietary hardware solution.
On the topic of the databases, feyd is correct again in regards to the very least being a master/slave replication setup. But let's go further. If you are seeing tons of traffic, then your best bet is to setup multiple slave machines then load balance all of your select type queries amongst them. The master will take care of everything else (UPDATES, INSERTS, DELETES, ALTER...).
Be aware that some of the hardware solutions out there are bridges as opposed to gateways/routers. This is the case with the Cisco LocalDirectors.
Lastly, do some digging on the topic of port forwarding. Whatever solution you end up with is bound to use port forwarding in a more or less transparent (to the admin) fashion.
Hope this helps,
BDKR