Page 2 of 6

Re: Scalability

Posted: Thu Jul 02, 2009 10:04 pm
by Christopher
Interesting discussion. I get the sense that replication is more frequently used than clustering. Clustering seems to be more complex and needs minimum 4-5 server to implement. It seems like there are a number of different replication schemes though. I also notice that some of the projects like Wordpress, Joomla and Drupal have implement DB connection libraries that support master/slave and/or multiple DBs. I have been thinking about that as well.

Re: Scalability

Posted: Thu Jul 02, 2009 10:49 pm
by Eran
There is no minimum on the number of servers participating in a cluster (as far as I know). It can start from two and grow from there.
The point is those options offer different solutions, MySQL cluster is an in-memory solution and has other considerations such as network latency. However, since the entire cluster is treated as one server, it is easier to integrate with an existing application.

Master/slave replication is good for reads but bad for writes, since you can't offload writes from the master. Master-Master setup helps with this but is more complicated to set up and is not officially supported.

In short, there is no one-solution fits all answer, it has to be customized to the problem and skills at hand.

Re: Scalability

Posted: Fri Jul 03, 2009 12:41 am
by Christopher
pytrin wrote:There is no minimum on the number of servers participating in a cluster (as far as I know). It can start from two and grow from there.
Yes, I know there is no technical minimum but I think configurations realistically start with: 2 MySQL Server Nodes, 2 Data Nodes and 1 Management Node. At least from what I have read. That's where I got 5 servers.
pytrin wrote:The point is those options offer different solutions, MySQL cluster is an in-memory solution and has other considerations such as network latency. However, since the entire cluster is treated as one server, it is easier to integrate with an existing application.

Master/slave replication is good for reads but bad for writes, since you can't offload writes from the master. Master-Master setup helps with this but is more complicated to set up and is not officially supported.

In short, there is no one-solution fits all answer, it has to be customized to the problem and skills at hand.
I think we can probably narrow the possibilities of what most people here would be interested in. That's because most here build small to medium size web application. The simplest and most common small web application architectures are:

1. Single server solution - static content, dynamic content, and database server all on the same server

2. Multiple single server solution - static content, dynamic content, and database server each on its own server. It could be two or three servers. This does not require any real architectural changes, just changing the DB connection host info and the URL to static content. This is really a performance improvement only. If any of the servers die, the site is essentially down.

So my question is, what are the next steps up from that? And how do different arrangements compare in performance, easy of setup/administration, stability, availability?

Re: Scalability

Posted: Fri Jul 03, 2009 4:42 am
by kaisellgren
An application with Master-Slave replication support would probably detect SELECT queries and use a random slave server for it, and the primary master server for all other queries.

@pytrin: what would an application do if it is supposed to support Master-Master other than having more than one master server in the configuration to connect to?

We could split scaling into four parts:

CPU
Is there anything a PHP application should do to support scaling out due to high CPU usage? As far as I know, there's nothing you need/can do, it would be server/hardware related only?

RAM
A PHP application probably needs to support Memcached. A class that is capable of connecting to different Memcached servers is basically all we need?

Disc space
If we are to build new RapidShare, what do we need to take care of?

Databases
Our database class must at least support the official and most widely used Master-Slave replication. What about clustering - is there something you need to do?

Edit: I have built a quick Master-Slave support for my database class. Let me explain what it does. First of all, I have a configuration file where the site admin can list slave servers and the primary master server. There must be a master server, but slave servers are optional and there may be as many slaves as he wants. Whenever a query is made, I check the first 6 bytes if they match "SELECT" and if they do, then I query against a random selected slave server (mt_rand()). In all other cases, I just use the primary master server. I have told developers to never type anything before "SELECT", but if they do type something (e.g. comments) then nothing bad happens - the query just goes to the master server making the application slightly slower and can easily be "fixed" by removing the stuff before the "SELECT". I also offer a parameter to specify a query as "CRITICAL" so that SELECTs will go to the master and may not be out-dated. In a Master-Slave replication, one has one master server and 1-n slaves. Slaves handle the SELECTs whereas the master handles all writes. After the master has update the records, it will create a binary log and send it to all slaves who will then update their content. Due to this reason, sometimes some slaves might be out-dated for a minute or even a few minutes on heavy loaded and big sites like Wikipedia. So, that's why I have a "CRITICAL" flag in my queries if it's needed for the query to get processed immediately, it will go to the master.

For Master-Master replication, I have no idea.. I couldn't get my home M-M to work :oops: so I support M-S only at the moment - it was easier to setup and test. If someone knows what a database class is supposed to handle to support M-M replication, I would like to hear.

Re: Scalability

Posted: Fri Jul 03, 2009 5:33 am
by onion2k
Never worry about scaling until it's a problem, especially if your website isn't live yet. Businesses fall into 2 groups: those that run out of money before they run out of infrastructure (by far the most common), and those that end up with so much investment capital you can easily recruit people who have already dealt with scaling issues.

Re: Scalability

Posted: Fri Jul 03, 2009 8:37 am
by Eran
@pytrin: what would an application do if it is supposed to support Master-Master other than having more than one master server in the configuration to connect to?
The main problem is contention. If both masters attempt to create/change the same data at the same time, replication will break (for example, editing the same row, or inserting a row with an auto-increment column at the same time). This adds some more overhead to setup and maintenance. The MySQL cluster on the other hand does not share those problems since it is synchronous and changes happen across all nodes at the same time.
That's because most here build small to medium size web application.
Small web applications don't really have major scaling needs. It's when you cross a certain threshold that those concerns become an issue. Beyond a certain point, all applications will need multiple servers for each major service (httpd, mysql and others).

But as onion and I said, you should only worry about it as you approach the scalablity point. At this point you are considered a success and you will either have VC backing or income to hire professionals to handle the scaling for you.

Re: Scalability

Posted: Fri Jul 03, 2009 9:21 am
by Theory?
onion2k wrote:Never worry about scaling until it's a problem, especially if your website isn't live yet. Businesses fall into 2 groups: those that run out of money before they run out of infrastructure (by far the most common), and those that end up with so much investment capital you can easily recruit people who have already dealt with scaling issues.
I'm not so much worried about it, just curious. There's so much about the server side of things I don't fully grasp yet, so I'm just interested in what's happening.

Re: Scalability

Posted: Fri Jul 03, 2009 11:29 am
by Christopher
onion2k wrote:Never worry about scaling until it's a problem, especially if your website isn't live yet. Businesses fall into 2 groups: those that run out of money before they run out of infrastructure (by far the most common), and those that end up with so much investment capital you can easily recruit people who have already dealt with scaling issues.
pytrin wrote:Small web applications don't really have major scaling needs. It's when you cross a certain threshold that those concerns become an issue. Beyond a certain point, all applications will need multiple servers for each major service (httpd, mysql and others).
I agree and I think scaling is less of a problem than availability -- at least for me. Most of my clients will pay for more servers when needed because that usually means they have more clients and therefore revenue. But availability is different issue. What are the options to provide much shorter downtime in the event of a server failure. That probably means multiple database servers with some kinds of failover and multiple application servers behind a load balancer that can detect failures. So what are some solutions? And for the developers here, I think simplicity of installation/maintenance would probably be more important than being able to scale to 1000 servers.

Re: Scalability

Posted: Fri Jul 03, 2009 11:51 am
by Eran
In that case, the cluster is the best choice for high-availability. Not necessarily just a db-cluster though, check out Linux HA (heartbeat) - http://www.linux-ha.org/. Some also advocate running different services on separate VM instances, which can crash and go back up independently. Creating reusable VM images also allows you to quickly create additional instances of the same service for more redundancy / failover.

Backup and recovery is also a big part of high availability.
An article on the VMware site on backup with clustering - http://www.vmware.com/technology/high-a ... ackup.html
An article on using LVM snapshot to backup MySQL - http://www.mysqlperformanceblog.com/200 ... ion-setup/
I also recommend picking up high-performance MySQL 2nd edition. There's a lot of excellent data there on scalability and high-availability that goes way beyond just databases - http://www.amazon.com/dp/0596101716

Re: Scalability

Posted: Fri Jul 03, 2009 12:45 pm
by Theory?
On the note of MySQL, I hear a lot of people migrating to PostgreSQL, why is that?

Re: Scalability

Posted: Fri Jul 03, 2009 1:04 pm
by califdon
Theory? wrote:On the note of MySQL, I hear a lot of people migrating to PostgreSQL, why is that?
Is that what that rumbling sound I've been hearing is??!!

Re: Scalability

Posted: Fri Jul 03, 2009 1:28 pm
by Christopher
pytrin wrote:In that case, the cluster is the best choice for high-availability.
Have you actually implemented clustering?

Re: Scalability

Posted: Fri Jul 03, 2009 2:16 pm
by Eran
I'm not the server guy in our operation, so I personally did not implement clustering. I'm not an expert on the matter, all the information I have comes from my own previous research and talking to experts.

Re: Scalability

Posted: Fri Jul 03, 2009 2:34 pm
by kaisellgren
Is there anything that a PHP application must take care of in order to have a website operating under a clustered system? I bet no and if I'm right, then there are better places to find information about server related stuff..

Re: Scalability

Posted: Fri Jul 03, 2009 2:39 pm
by Christopher
Did you "server guy" (I think I know him ;)) implement a cluster? My problem is that I have read a lot of these articles too, but none strike me as very simple. Ultimately I am a web developer so I am always in search of solutions that are easy to deploy and maintain. I am looking for levels #3 and #4 above the standard #1 and #2 schemes I listed above.