Scalability
Moderator: General Moderators
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Scalability
Interesting discussion. I get the sense that replication is more frequently used than clustering. Clustering seems to be more complex and needs minimum 4-5 server to implement. It seems like there are a number of different replication schemes though. I also notice that some of the projects like Wordpress, Joomla and Drupal have implement DB connection libraries that support master/slave and/or multiple DBs. I have been thinking about that as well.
(#10850)
Re: Scalability
There is no minimum on the number of servers participating in a cluster (as far as I know). It can start from two and grow from there.
The point is those options offer different solutions, MySQL cluster is an in-memory solution and has other considerations such as network latency. However, since the entire cluster is treated as one server, it is easier to integrate with an existing application.
Master/slave replication is good for reads but bad for writes, since you can't offload writes from the master. Master-Master setup helps with this but is more complicated to set up and is not officially supported.
In short, there is no one-solution fits all answer, it has to be customized to the problem and skills at hand.
The point is those options offer different solutions, MySQL cluster is an in-memory solution and has other considerations such as network latency. However, since the entire cluster is treated as one server, it is easier to integrate with an existing application.
Master/slave replication is good for reads but bad for writes, since you can't offload writes from the master. Master-Master setup helps with this but is more complicated to set up and is not officially supported.
In short, there is no one-solution fits all answer, it has to be customized to the problem and skills at hand.
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Scalability
Yes, I know there is no technical minimum but I think configurations realistically start with: 2 MySQL Server Nodes, 2 Data Nodes and 1 Management Node. At least from what I have read. That's where I got 5 servers.pytrin wrote:There is no minimum on the number of servers participating in a cluster (as far as I know). It can start from two and grow from there.
I think we can probably narrow the possibilities of what most people here would be interested in. That's because most here build small to medium size web application. The simplest and most common small web application architectures are:pytrin wrote:The point is those options offer different solutions, MySQL cluster is an in-memory solution and has other considerations such as network latency. However, since the entire cluster is treated as one server, it is easier to integrate with an existing application.
Master/slave replication is good for reads but bad for writes, since you can't offload writes from the master. Master-Master setup helps with this but is more complicated to set up and is not officially supported.
In short, there is no one-solution fits all answer, it has to be customized to the problem and skills at hand.
1. Single server solution - static content, dynamic content, and database server all on the same server
2. Multiple single server solution - static content, dynamic content, and database server each on its own server. It could be two or three servers. This does not require any real architectural changes, just changing the DB connection host info and the URL to static content. This is really a performance improvement only. If any of the servers die, the site is essentially down.
So my question is, what are the next steps up from that? And how do different arrangements compare in performance, easy of setup/administration, stability, availability?
(#10850)
- kaisellgren
- DevNet Resident
- Posts: 1675
- Joined: Sat Jan 07, 2006 5:52 am
- Location: Lahti, Finland.
Re: Scalability
An application with Master-Slave replication support would probably detect SELECT queries and use a random slave server for it, and the primary master server for all other queries.
@pytrin: what would an application do if it is supposed to support Master-Master other than having more than one master server in the configuration to connect to?
We could split scaling into four parts:
CPU
Is there anything a PHP application should do to support scaling out due to high CPU usage? As far as I know, there's nothing you need/can do, it would be server/hardware related only?
RAM
A PHP application probably needs to support Memcached. A class that is capable of connecting to different Memcached servers is basically all we need?
Disc space
If we are to build new RapidShare, what do we need to take care of?
Databases
Our database class must at least support the official and most widely used Master-Slave replication. What about clustering - is there something you need to do?
Edit: I have built a quick Master-Slave support for my database class. Let me explain what it does. First of all, I have a configuration file where the site admin can list slave servers and the primary master server. There must be a master server, but slave servers are optional and there may be as many slaves as he wants. Whenever a query is made, I check the first 6 bytes if they match "SELECT" and if they do, then I query against a random selected slave server (mt_rand()). In all other cases, I just use the primary master server. I have told developers to never type anything before "SELECT", but if they do type something (e.g. comments) then nothing bad happens - the query just goes to the master server making the application slightly slower and can easily be "fixed" by removing the stuff before the "SELECT". I also offer a parameter to specify a query as "CRITICAL" so that SELECTs will go to the master and may not be out-dated. In a Master-Slave replication, one has one master server and 1-n slaves. Slaves handle the SELECTs whereas the master handles all writes. After the master has update the records, it will create a binary log and send it to all slaves who will then update their content. Due to this reason, sometimes some slaves might be out-dated for a minute or even a few minutes on heavy loaded and big sites like Wikipedia. So, that's why I have a "CRITICAL" flag in my queries if it's needed for the query to get processed immediately, it will go to the master.
For Master-Master replication, I have no idea.. I couldn't get my home M-M to work
so I support M-S only at the moment - it was easier to setup and test. If someone knows what a database class is supposed to handle to support M-M replication, I would like to hear.
@pytrin: what would an application do if it is supposed to support Master-Master other than having more than one master server in the configuration to connect to?
We could split scaling into four parts:
CPU
Is there anything a PHP application should do to support scaling out due to high CPU usage? As far as I know, there's nothing you need/can do, it would be server/hardware related only?
RAM
A PHP application probably needs to support Memcached. A class that is capable of connecting to different Memcached servers is basically all we need?
Disc space
If we are to build new RapidShare, what do we need to take care of?
Databases
Our database class must at least support the official and most widely used Master-Slave replication. What about clustering - is there something you need to do?
Edit: I have built a quick Master-Slave support for my database class. Let me explain what it does. First of all, I have a configuration file where the site admin can list slave servers and the primary master server. There must be a master server, but slave servers are optional and there may be as many slaves as he wants. Whenever a query is made, I check the first 6 bytes if they match "SELECT" and if they do, then I query against a random selected slave server (mt_rand()). In all other cases, I just use the primary master server. I have told developers to never type anything before "SELECT", but if they do type something (e.g. comments) then nothing bad happens - the query just goes to the master server making the application slightly slower and can easily be "fixed" by removing the stuff before the "SELECT". I also offer a parameter to specify a query as "CRITICAL" so that SELECTs will go to the master and may not be out-dated. In a Master-Slave replication, one has one master server and 1-n slaves. Slaves handle the SELECTs whereas the master handles all writes. After the master has update the records, it will create a binary log and send it to all slaves who will then update their content. Due to this reason, sometimes some slaves might be out-dated for a minute or even a few minutes on heavy loaded and big sites like Wikipedia. So, that's why I have a "CRITICAL" flag in my queries if it's needed for the query to get processed immediately, it will go to the master.
For Master-Master replication, I have no idea.. I couldn't get my home M-M to work
Re: Scalability
Never worry about scaling until it's a problem, especially if your website isn't live yet. Businesses fall into 2 groups: those that run out of money before they run out of infrastructure (by far the most common), and those that end up with so much investment capital you can easily recruit people who have already dealt with scaling issues.
Re: Scalability
The main problem is contention. If both masters attempt to create/change the same data at the same time, replication will break (for example, editing the same row, or inserting a row with an auto-increment column at the same time). This adds some more overhead to setup and maintenance. The MySQL cluster on the other hand does not share those problems since it is synchronous and changes happen across all nodes at the same time.@pytrin: what would an application do if it is supposed to support Master-Master other than having more than one master server in the configuration to connect to?
Small web applications don't really have major scaling needs. It's when you cross a certain threshold that those concerns become an issue. Beyond a certain point, all applications will need multiple servers for each major service (httpd, mysql and others).That's because most here build small to medium size web application.
But as onion and I said, you should only worry about it as you approach the scalablity point. At this point you are considered a success and you will either have VC backing or income to hire professionals to handle the scaling for you.
Re: Scalability
I'm not so much worried about it, just curious. There's so much about the server side of things I don't fully grasp yet, so I'm just interested in what's happening.onion2k wrote:Never worry about scaling until it's a problem, especially if your website isn't live yet. Businesses fall into 2 groups: those that run out of money before they run out of infrastructure (by far the most common), and those that end up with so much investment capital you can easily recruit people who have already dealt with scaling issues.
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Scalability
onion2k wrote:Never worry about scaling until it's a problem, especially if your website isn't live yet. Businesses fall into 2 groups: those that run out of money before they run out of infrastructure (by far the most common), and those that end up with so much investment capital you can easily recruit people who have already dealt with scaling issues.
I agree and I think scaling is less of a problem than availability -- at least for me. Most of my clients will pay for more servers when needed because that usually means they have more clients and therefore revenue. But availability is different issue. What are the options to provide much shorter downtime in the event of a server failure. That probably means multiple database servers with some kinds of failover and multiple application servers behind a load balancer that can detect failures. So what are some solutions? And for the developers here, I think simplicity of installation/maintenance would probably be more important than being able to scale to 1000 servers.pytrin wrote:Small web applications don't really have major scaling needs. It's when you cross a certain threshold that those concerns become an issue. Beyond a certain point, all applications will need multiple servers for each major service (httpd, mysql and others).
(#10850)
Re: Scalability
In that case, the cluster is the best choice for high-availability. Not necessarily just a db-cluster though, check out Linux HA (heartbeat) - http://www.linux-ha.org/. Some also advocate running different services on separate VM instances, which can crash and go back up independently. Creating reusable VM images also allows you to quickly create additional instances of the same service for more redundancy / failover.
Backup and recovery is also a big part of high availability.
An article on the VMware site on backup with clustering - http://www.vmware.com/technology/high-a ... ackup.html
An article on using LVM snapshot to backup MySQL - http://www.mysqlperformanceblog.com/200 ... ion-setup/
I also recommend picking up high-performance MySQL 2nd edition. There's a lot of excellent data there on scalability and high-availability that goes way beyond just databases - http://www.amazon.com/dp/0596101716
Backup and recovery is also a big part of high availability.
An article on the VMware site on backup with clustering - http://www.vmware.com/technology/high-a ... ackup.html
An article on using LVM snapshot to backup MySQL - http://www.mysqlperformanceblog.com/200 ... ion-setup/
I also recommend picking up high-performance MySQL 2nd edition. There's a lot of excellent data there on scalability and high-availability that goes way beyond just databases - http://www.amazon.com/dp/0596101716
Re: Scalability
On the note of MySQL, I hear a lot of people migrating to PostgreSQL, why is that?
Re: Scalability
Is that what that rumbling sound I've been hearing is??!!Theory? wrote:On the note of MySQL, I hear a lot of people migrating to PostgreSQL, why is that?
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Scalability
Have you actually implemented clustering?pytrin wrote:In that case, the cluster is the best choice for high-availability.
(#10850)
Re: Scalability
I'm not the server guy in our operation, so I personally did not implement clustering. I'm not an expert on the matter, all the information I have comes from my own previous research and talking to experts.
- kaisellgren
- DevNet Resident
- Posts: 1675
- Joined: Sat Jan 07, 2006 5:52 am
- Location: Lahti, Finland.
Re: Scalability
Is there anything that a PHP application must take care of in order to have a website operating under a clustered system? I bet no and if I'm right, then there are better places to find information about server related stuff..
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Scalability
Did you "server guy" (I think I know him
) implement a cluster? My problem is that I have read a lot of these articles too, but none strike me as very simple. Ultimately I am a web developer so I am always in search of solutions that are easy to deploy and maintain. I am looking for levels #3 and #4 above the standard #1 and #2 schemes I listed above.
(#10850)