Scalability
Moderator: General Moderators
Scalability
In lieu of a long story, here's the question in a nutshell:
How does PHP stack up in scalability vs. RoR and Java. I know Java is obviously the most scalable, but it's cumbersome, slow, and in the end I'd rather avoid it altogether if possible. I keep hearing mixed things about RoR and it's capabilities, but the fact that some people are certain it can't scale, I figured I'd come to the experts. I know PHP isn't exactly a scaling machine, but given that some rather large sites run on it and experience minimal problems, I have to wonder what the breaking point is for either language.
How does PHP stack up in scalability vs. RoR and Java. I know Java is obviously the most scalable, but it's cumbersome, slow, and in the end I'd rather avoid it altogether if possible. I keep hearing mixed things about RoR and it's capabilities, but the fact that some people are certain it can't scale, I figured I'd come to the experts. I know PHP isn't exactly a scaling machine, but given that some rather large sites run on it and experience minimal problems, I have to wonder what the breaking point is for either language.
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Scalability
Scaling has little to do with the programming language. Both PHP and Java are proven to scale equally well -- as demonstrated by many huge sites. RoR being a rapid development framework is not focused on scaling and has problems. However if you are an expert in RoR I am sure you can get it to scale.
I am actually interested in this subject. Specifically, what are some scalable architectures for PHP that are reasonably easy to install and maintain. And not just scalability, but high(er) availability is probably also a goal.
There are certainly a number of common tools like MySQL replication, memcache, PHP's ability to do database backed session management, etc. that would probably be used for this kind of solution.
I am actually interested in this subject. Specifically, what are some scalable architectures for PHP that are reasonably easy to install and maintain. And not just scalability, but high(er) availability is probably also a goal.
There are certainly a number of common tools like MySQL replication, memcache, PHP's ability to do database backed session management, etc. that would probably be used for this kind of solution.
(#10850)
Re: Scalability
simple scaling
- one big db server
- nas (for global files)
- loadbalancer with session support
- some http servers
if the bottleneck becomes your db server you will need to setup replication and implement your data access with heartbeat support (a way to see which db server is available and ready to handle some work). from this point on you scale simply by adding http and mysql servers as needed. that's the nice thing about stateless requests - they scale well.
i am no java expert but as long as you need to use application servers like tomcat you can't scale that easy because the servers have state and that would need to be shared. there might be solutions for this so if someone knows about scaling java sites step up (: i am interessted to learn how they do things.
cheers
chris
- one big db server
- nas (for global files)
- loadbalancer with session support
- some http servers
if the bottleneck becomes your db server you will need to setup replication and implement your data access with heartbeat support (a way to see which db server is available and ready to handle some work). from this point on you scale simply by adding http and mysql servers as needed. that's the nice thing about stateless requests - they scale well.
i am no java expert but as long as you need to use application servers like tomcat you can't scale that easy because the servers have state and that would need to be shared. there might be solutions for this so if someone knows about scaling java sites step up (: i am interessted to learn how they do things.
cheers
chris
Re: Scalability
Also what about PHP's speed when paired with other databases like PostgreSQL?
-
alex.barylski
- DevNet Evangelist
- Posts: 6267
- Joined: Tue Dec 21, 2004 5:00 pm
- Location: Winnipeg
Re: Scalability
PHP speed has nothing to do with other databases...
The PHP extensions are just thin wrappers around the original RDBMS API so as fast as you can get MySQL to run using Ruby or ASP.NET or Java, is the same speed you'll get out of PHP as well.
I've been gaining interest in load balancing as our own systems begin to grow...
What we have right now is a power computer running the RDBMS (MSSQL and MySQL) and a lesser server running LA_P (where _ would normal be MySQL).
There is a Apache module called mod_proxy which I believe allows you to daisy chain/round robin several servers togather to handle user requests, each would ultimately use the same MySQL server though.
I tend to keep SQL queries very simple, operating on a single table only. This results in fast queries and probably slower performance on the PHP side of things (perhaps a little backwards -- but I am far more familiar with optimizing PHP and it's various services than I am SQL tuning). We still don't serve enough traffic to justify implementing mod_proxy but the ide makes sense.
My focus is almost always on application level caching, as you will typically yield greatest results from that.
The PHP extensions are just thin wrappers around the original RDBMS API so as fast as you can get MySQL to run using Ruby or ASP.NET or Java, is the same speed you'll get out of PHP as well.
I've been gaining interest in load balancing as our own systems begin to grow...
What we have right now is a power computer running the RDBMS (MSSQL and MySQL) and a lesser server running LA_P (where _ would normal be MySQL).
There is a Apache module called mod_proxy which I believe allows you to daisy chain/round robin several servers togather to handle user requests, each would ultimately use the same MySQL server though.
I tend to keep SQL queries very simple, operating on a single table only. This results in fast queries and probably slower performance on the PHP side of things (perhaps a little backwards -- but I am far more familiar with optimizing PHP and it's various services than I am SQL tuning). We still don't serve enough traffic to justify implementing mod_proxy but the ide makes sense.
My focus is almost always on application level caching, as you will typically yield greatest results from that.
- kaisellgren
- DevNet Resident
- Posts: 1675
- Joined: Sat Jan 07, 2006 5:52 am
- Location: Lahti, Finland.
Re: Scalability
Not entirely... think about the new MYSQLND library they made for 5.3, it is noticeably faster than the old LIBMYSQL used in <5.3. MySQL on 5.3 does run faster than PostgreSQL on 5.3, but this is almost insignificant though and the database performance comes from elsewhere as we know.PCSpectra wrote:so as fast as you can get MySQL to run using Ruby or ASP.NET or Java, is the same speed you'll get out of PHP as well.
Before spending any money on load balancing, I highly suggest to drop Apache - it's a BIG resource hog. Personally I have found a combination of nginx for static files + Cherokee for PHP files a good combo. On my VPS, a switch from Apache to that setup yielded in about 60-90 MB of free RAM, the idle process usage has went from the usual ~9% to ~21% (approximate values), and the whole site loads faster.PCSpectra wrote:What we have right now is a lesser server running LA_P (where _ would normal be MySQL).
Take a look: http://www.cherokee-project.com/doc/
I highly recommended it.
Re: Scalability
If you're interested in scalability, check out High Scalability - http://highscalability.com/, a site that covers the architecture of high-profile sites and offers insight into the solutions they chose.
Re: Scalability
A friend of mine has done some benchmarking "Apache vs nginx testing". While you won't be able to understand the text part, the figures in the article are enough 
http://www.gat3way.eu/index.php?mact=Ne ... eturnid=15
http://www.gat3way.eu/index.php?mact=Ne ... eturnid=15
There are 10 types of people in this world, those who understand binary and those who don't
-
alex.barylski
- DevNet Evangelist
- Posts: 6267
- Joined: Tue Dec 21, 2004 5:00 pm
- Location: Winnipeg
Re: Scalability
VladSun: Seems Apache (mod_prefork) does alright when compared to the others...no???
Not that anyone cares, but if I were to switch to another server it would probably lighttpd.
EDIT | Good to see you around VladSun...haven't seen you in a while (not sure if you were just lurking) but I was getting worried you disappeared for good and I'd have no one to help me with my *nix problems anymore.

Not that anyone cares, but if I were to switch to another server it would probably lighttpd.
EDIT | Good to see you around VladSun...haven't seen you in a while (not sure if you were just lurking) but I was getting worried you disappeared for good and I'd have no one to help me with my *nix problems anymore.
Re: Scalability
In two of the comments in the article, system administrators explain that with nginx machine load is 2-4 times less than with Apache.PCSpectra wrote:VladSun: Seems Apache (mod_prefork) does alright when compared to the others...no???
Not that anyone cares, but if I were to switch to another server it would probably lighttpd.
Too much of deadlines latelyPCSpectra wrote:EDIT | Good to see you around VladSun...haven't seen you in a while (not sure if you were just lurking) but I was getting worried you disappeared for good and I'd have no one to help me with my *nix problems anymore.![]()
There are 10 types of people in this world, those who understand binary and those who don't
Re: Scalability: MySQL 5.4.1?
MySQL released 5.4.1 the other day with "scalability improvements" -- how badly were they needed?kaisellgren wrote:MySQL on 5.3 does run faster than PostgreSQL on 5.3, but this is almost insignificant though and the database performance comes from elsewhere as we know.
http://www.ramoonus.nl/2009/06/30/mysql-5-4-1-released/
Re: Scalability
This is an interesting topic (about which my knowledge is meager). An acquaintance of mine is a real expert, though, and is co-founder of a small company that specializes in scaling solutions. As a former MySQL AB employee, then the chief MySQL guru at Yahoo!, he's the most knowledgeable guy I know on this topic. http://jcole.us/blog/about-me/
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Scalability
So perhaps we can have a discussion of the different options. There are Master:Slave, Master:Master, Replication Rings, Clusters, etc., etc. What are the features to each provide, and downsides do they present. And what do PHP database libraries need to support this stuff?
(#10850)
Re: Scalability
What I'm really trying to wrap my head around is if its better to have multiple databases or to have one big database server that is both well structured and leveraged by a load balancing lightweight web server. The overall architecture is where I'm stuck.
Re: Scalability
It's very dependent on the particular needs of a specific application. One machine is easier to maintain than a replication chain / cluster but ultimately it will saturate - meaning it will not scale beyond a certain point. At that point you will have to separate the database to separate machines.
Whether this is a decision to be taken before even approaching such a saturation point is highly debatable. Personally I would recommend going with the simplest and most maintainable approach first, allowing for enough room to improve later when (if) the need arises.
I did some research on database scaling solution some time ago, the results of which are posted on stackoverflow if you are interested - http://stackoverflow.com/questions/1899 ... clustering
Whether this is a decision to be taken before even approaching such a saturation point is highly debatable. Personally I would recommend going with the simplest and most maintainable approach first, allowing for enough room to improve later when (if) the need arises.
I did some research on database scaling solution some time ago, the results of which are posted on stackoverflow if you are interested - http://stackoverflow.com/questions/1899 ... clustering