arborint wrote:I want to thank you William, this is exactly the kind of practical information I was looking for.
No problem.

If you only knew how much this forum has helped me, if you look at my original posts back around 2003 (I think) I made such a fool of myself hehe
arborint wrote:So you think Master/Master is the next logical step?
I need to re-read the discussion when I have more time (today is the 4th and have been really busy, just checked the security forums to see what advice Kai is telling people. I find it interesting.

(thanks Kai). Then I noticed a scalability topic and was instantly interested. But to better answer your question, Master/Master is not really to handle more load, it's to make sure if one breaks you still can operate at full capacity. Servers will break eventually, and you can't afford your database server to crash out of all things. This gives you time to power up another server, have it get up to "speed" on all the latest data and have it set back into motion.
arborint wrote:What do you think about active/passive masters? It is sort of a minimal version of #1 with one read/write server and one read server.
Like I said above, Master/Master is for availability, not really scalability. What you're talking about is having a master and a slave. Masters can still always be used for reads, it's really about how much load you're getting. You can have 1% writes and 99% writes and still have a master server which is used to write to all of the other servers. The whole point of a "master" is to take writes and distribute them to slave(s). So to answer your question, thats fine but if your master dies, your slave needs to be able to become a "new" master and neither servers can ever be past 50% capacity since if one fails the other has to take everything.
What sort of application support is needed for master/slave architectures? I know that many projects have implemented database code like WordPress's HyperDB to support these architectures.
arborint wrote:And, what sort of information is useful in setting up and managing master and slave server, and bring failed server back online?
I'll have to get into this at another time.
arborint wrote:I am not sure this stuff is too practical to most programmers here, but it is interesting. It might be interesting to know which if these techniques are useful for smaller solutions.
I agree, I actually try to denormalize a bit on all my database structures. Especially for things like database counts like how many posts someone has, or comments, etc. Although thats probably considered normal practice now days, it's still not really "normalized".
arborint wrote:I think those are all options. I think many of them should be whole separate discussions. Perhaps we should do a series? And it might be interesting to have some discussion on how memcache might fit into these architectures.
Yeah that would work. Memcache is HUGE in my opinion. It can seriously reduce your server loads by like 100% or even way more. I mean think about it simply like this. You have a main web page portal that gets 100 hits per second. Now there is 3 query's on that page to get the latest comments, latest discussions, and latest news. Well thats 300 QPS. Well you can Memcache that for lets say 10 minutes. That means that you're going to go from 60000 query's in 10 minutes to 3 in 10 minutes (if done right). Kind of a bad example since if you were getting that much traffic you would have had to already done most of this. But you get the point. APC is also amazing but I'm sure most people here know that. Scalability is about being able to scale for traffic, but sometimes you need to optimize what you have before you start scaling out to more machines.
arborint wrote:Yes, pytrin has given a lot of helpful links above. The Cal Henderson book is 3-4 years old at this point. It is still current for people to buy? Are there newer, better books?
I highly recommend his book. The guy is extremely smart with this kind of thing. It goes over so much, down to things like internationalization, database scaling (sharding, replication, etc), stuff like version control for teams, to bug tracking software, to supporting stuff like UTF8, and the list just goes on and on. Plus he gives tons of examples of what Flickr has done which is great also.
Also, there has been discussion about scaling servers in terms of vertical vs horizontal on servers with expensive software. I think it is PlentyOfFish.com that actually spent over 100k on their database server. They found that the cost for performance based off software costs made it cheaper to do that then scale horizontally. Just so people know. I prefer going the open source route. Linux, Apache, MySQL, PHP, APC, Memcache, SphinxSearch, blah blah blah. But just for the few users that do work on Windows software, etc.