Page 1 of 1

Shared or isolated database?

Posted: Sun Nov 02, 2008 11:25 pm
by allspiritseve
Spinning off jshpro2's post in this thread: viewtopic.php?f=19&t=90062

I'm about to start on a project that sounds similar to his, that I discussed a little in this thread: viewtopic.php?f=19&t=87218

Basically, my boss and I are discussing the pros and cons of a shared database vs isolated databases for each client. I am posting in this forum because that decision impacts the entire application design, and I'd like to discuss people's thoughts. My boss is leaning towards a shared codebase, because he's trying to reduce fragmentation in db schema. I am leaning towards isolated databases, because I don't want to be limited by the schema. I want complete flexibility to add/edit/delete any number of fields/tables without worrying about destroying data in other sites.

I think that if we develop a strong workflow, fragmentation issues shouldn't be a problem. For the application, I'd like to move us to SVN and liberally use its tagging and merging capabilities. Creating a new site should be able to be done automatically using a script by checking out the latest build from the repository. For the database, I'd like to keep all updates in .sql files numbered in chronological order, and keep a flag in a config file with the database version each site is at.

Theoretically that should mean we can keep old sites up to date, while leaving complete flexibility for development on new sites. Thoughts?

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 8:02 am
by josh
My thoughts are its a great theory, but the fact that you can leave old sites out of date opens up the possibility that you will refactor so much, and some day come out with a better version, and your old customers will sit stagnant ( although happily ) on the old system. This will shoot yourself in the foot in the long run when you news of your new features gets to them ( by the grapevine perhaps ) and they ask to be upgraded, or need some new feature thats only in the new system. The longer they've sat stagnant the more the coupling of the system has shifted around, and as a result their little account will probably end up getting grouped with a bunch of other old accounts in some "legacy" sandbox, as you start creating more sandboxes, etc.. things really get to be a mess.

Regarding the shared vs isolated database, IMO everything should be shared ( which I made evident in the previous paragraph ). If there is something custom like custom fields IMO you should find the common denominator and come up with a normalized schema ( EAV based perhaps ) that allows you to store the varying schemas consistently. If I write code to build off one client's customizations, I want that code to be as cohesive as possible with other clients varying customizations.

I'll sum up my argument: "configuration, not convention."

There's no need to check out duplications of the repository for each provision, unless you want things to be able to get out of synch

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 8:35 am
by allspiritseve
And what if a single bug slips past our tests when we release a new site? That means every single one of our sites is going to go down. The same with the database. While I think it's an admirable goal to want to remove any and all duplicated code, I'd rather have each and every site in its own sandbox.

The reason I'm so confident in keeping all of our sites up to date is primarily SVN. Using eclipse I can branch, tag, and merge all of my code, so keeping an old site up to date would be as simple as checking out the new version on our dev server, upgrading the database to current, running some tests, and switching it to live.

EAV has already been discussed in other threads as being suited to a purpose where you need a large quantity of custom fields that are all of similar type. Using an EAV so that I can add a new table for a client's new feature does not fit into that purpose, and a normal relational table would fit much better.

In general... I'm not going to sacrifice the unlimited flexibility of isolated clients for the slim benefits of shared code and database. I think it can be done, it just requires some standards set in place for how to upgrade and when.

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 8:52 am
by josh
allspiritseve wrote:And what if a single bug slips past our tests when we release a new site? That means every single one of our sites is going to go down. The same with the database.
yep. you need extra test harnesses for sure
allspiritseve wrote:While I think it's an admirable goal to want to remove any and all duplicated code, I'd rather have each and every site in its own sandbox.
I understand why, theres no best solution
allspiritseve wrote:The reason I'm so confident in keeping all of our sites up to date is primarily SVN. Using eclipse I can branch, tag, and merge all of my code, so keeping an old site up to date would be as simple as checking out the new version on our dev server, upgrading the database to current, running some tests, and switching it to live.
You could achieve the some goals by having one code base per version of the application, and having all clients that are on that one codebase share the same code
allspiritseve wrote:EAV has already been discussed in other threads as being suited to a purpose where you need a large quantity of custom fields that are all of similar type. Using an EAV so that I can add a new table for a client's new feature does not fit into that purpose, and a normal relational table would fit much better.
EAV data modeling includes any tables that have a "thin" schema, where each row in a table represents one piece of fact about an entity, you can have a hybrid schema that is actually way more relational then a traditional "global triple store". The idea of having the physical tables is what I was discussing in the other thread, in a shared schema my idea was to put client customizations in their own database. In an ideal situation each client could override things from the shared schema, but then ultimately when apply the principles of refactoring and try to abstract everything out I always end up back at one shared schema no problem
allspiritseve wrote:In general... I'm not going to sacrifice the unlimited flexibility of isolated clients for the slim benefits of shared code and database. I think it can be done, it just requires some standards set in place for how to upgrade and when.
A shared schema is not any more inflexible then an isolated approach. Imagine a situation where you had 100 classes that all did the same thing, you fix 1 error now you have to remember to go and fix the other 99, just like your example of having just 1 bug slip into the system the duplication can have the same effect. Sure you could write a script to manage all those isolated schemas but you then fall back into a "generation" based approach, where the same problems code generation has on code applies to your schema. I've so far been unable to find anything I couldn't do in PHP without code generation or without fitting it into a shared schema, I've found, personally that the shared schema gives me a much more factored architecture, eliminating all concerns of schema concurrency.

What happens if client A has website A and website B, website A is running on some hybrid system, website B is running on your newest system, client A merges with client B at the corporate level ( who runs website C ) and client B wants to manage website A, B and C. You will not be able to respond to the changing market quickly enough and hopefully ( in my ideal world ) the client would transfer over to my system and I would eat your lunch :twisted: ( of course I'd have to get my own system based prototype stages and to a level of maturity before I could ever expect to compete with you :wink: )

Also a shared schema does not rule out moving tenants to their own dedicated schema, like the msdn resources talks about. At a certain point you would have to choose whether you need to distributed across your clusted based on row size of the client, volume of data access, etc..

A version of an application on a shared schema could easily be deployed on it's own node where tenant_id was a constant, or was factored out of the code. I come down hard on that idea though, and prefer to try to plan out so that I can scale horizontally, and have the system distribute things based on an algorithm, rather then me randomly shuffle accounts around all day or have to pay someone to do so for me

Keeping things isolated just sounds like more refactoring. Certainly I wouldn't want to build something for a client unless it had some re-use, and if it has re-use then why not make it truly re-usable without creating more work for yourself in the long run? Whenever a client asks for a feature I always ask myself if they're actually asking for an implementation of some broader metaphor. For instance: client asks to be able to list a custom catalog of books customized by isbn number, what I build is an indexible hash table functionality I can then deploy as "book database" feature or a "dog grooming store" listing database, both would be handled by the same component driven by some configurable metadata, much like we as programmers are just configuring a schema when we make databases, not writing database engines

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 9:05 am
by Eran
I would definitely use isolated databases, no question about it. In fact, as long as the customers site are working perfectly and no one is requesting any additional features, I won't touch them at all. Newer clients get the benefit of a newer version but are also unknowingly beta-testers until the features become hardened.

For most costumers, downtime is much more critical than a few extra features or some refactoring to the code (this especially is irrelevant to the customers - it is the developer's privilege only). If any requests upgrading then I would handle them one at a time and with great care. Just my 2-cents.

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 11:10 am
by allspiritseve
jshpro2 wrote:yep. you need extra test harnesses for sure
I'd love to be able to catch every bug before I launch something live, but that's unrealistic. No amount of tests can assure me 100% that a site will work-- at least if my sites are isolated, I only break one site and can introduce a new test to expose the bug, so no other site will suffer the same fate.
jshpro2 wrote:In an ideal situation each client could override things from the shared schema
I was going that route for a while, but then I came to realize I was being very data-centric. If a client wants a new schema, they're probably going to want new behavior too, (which they can't create themselves) so I might as well handle the schema myself.
jshpro2 wrote:Imagine a situation where you had 100 classes that all did the same thing, you fix 1 error now you have to remember to go and fix the other 99, just like your example of having just 1 bug slip into the system the duplication can have the same effect.


Not quite... I fix one error, commit it to the repository, and check out the new site with the bug fix. This is in a scenario when the bug resulted from new development. If I found a bug in the original codebase, I simply make a change, commit it, and check out a new version for each site.
jshpro2 wrote:What happens if...:wink: )
That's quite the contrived scenario you have there... but our admin system will be flexible enough to span several sites/databases. So I'll eat my own lunch thank you very much.

jshpro2 wrote:Keeping things isolated just sounds like more refactoring. Certainly I wouldn't want to build something for a client unless it had some re-use, and if it has re-use then why not make it truly re-usable without creating more work for yourself in the long run?
I think that's a big difference between us-- I live for the custom code and the new features. For the majority of the things we'll add, though, we will end up refactoring custom changes into the actual codebase and offering it to the client as a upgrade. That gives me the freedom to do truly custom work, and then when I don't have deadlines, I can convert that work into something I can reuse. I work so much faster when I don't have to worry about breaking several things at once (TDD helds immensely with this).
pytrin wrote:I would definitely use isolated databases, no question about it. In fact, as long as the customers site are working perfectly and no one is requesting any additional features, I won't touch them at all. Newer clients get the benefit of a newer version but are also unknowingly beta-testers until the features become hardened.

For most costumers, downtime is much more critical than a few extra features or some refactoring to the code (this especially is irrelevant to the customers - it is the developer's privilege only). If any requests upgrading then I would handle them one at a time and with great care. Just my 2-cents.
Thanks Pytrin, I agree with you. That's why I'd like to take advantages of tags, so older clients can be on a certain tag for as long as they need to be. Since we'd like to offer new features when we create them, that's often motivation to upgrade the older client sites. As long as we test them offline, and make sure our new features don't overwrite any old features, we should be ok.

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 11:26 am
by Christopher
I agree with Pytrin on separate databases for each client application. I make sure to clearly document code releases and write upgrade scripts to go from one release to the next. If a client want an upgrade then it is a straight forward process of backing up their site and then doing the upgrade steps.

PS - I also mix central and separate if there is common data that all sites use that has a historically stable schema.

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 11:37 am
by allspiritseve
arborint wrote:PS - I also mix central and separate if there is common data that all sites use that has a historically stable schema.
That's sort of the plan... keep a central codebase (or versions of one) in the repository, that are checked out when the site is updated using svn:externals. That way changes won't ripple out even in the shared code when I make changes, unless I explicitly check them out.

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 12:29 pm
by josh
arborint wrote: If a client want an upgrade then it is a straight forward process of backing up their site and then doing the upgrade steps.
Problem is they don't always ask, sometimes they just switch to a new provider.. Clients with lots of custom work like you have been discussing, call for a shared codebase, the architecture for such a system is different from the architecture in a system that needs to be fully multi tenant. Sure the isolate approach can work but you're spending more time engineering each system to work in isolation, when that could be going towards building more standard features to appease the masses. You might be able to make 5x as much as me by picking up a really great client who needs lot of custom stuff, but basically what I'm saying is the amount of small sites that just need basic features is where its going to be most lucrative. Basically my theory is solve a collective problem before I go try to solve a specific problem.

For instance if I have 10 clients selling niche products, I want to be able to go in and make 100 sites that resell all the products from each of those client sites, automatically selecting whoever has that product listed for the cheapest wholesale cost, for instance something like http://www.csnstores.com where their main site lets you shop 100s of sites all at once. With an isolated approach the information is scattered across your topology and harder to engineer a common form of access.

The point of SaaS is that clients get upgrades automatically, they pay a premium for it so they dont have to ask. For the same reason a person will pay more for a cold 20oz bottle of soda then they would for the warm 2 liter bottle. You are selling convenience. There are always going to be clients who want something specialized, and I'm sure they provide a very nice lunch too ;-) Just because the old way works doesn't mean the new way can't work also ( http://www.netsuite.com http://www.salesforce.com http://www.volusion.com ). None of these platforms do anything a lot of the people on these forums couldn't make. In fact volusion for one is extremely buggy, but they have grown into multi million dollar fortune 500 companies. There is real tangible wealth backing them up, not superfluous just cash flow. Their successes would guide me to focus on re-usability before I focus on a short term goal of pleasing just 1 client. If you can hold out longer to get exponentially more clients in the long run, then great.

I just posted this in another thread in the Enterprise but I'll post it here. http://en.wikipedia.org/wiki/The_Long_Tail Basically my strategy is to provide the cohesive building blocks my users can use to cherry pick functionality ala cart pricing structure, to build themselves an ad hoc software solution, much like dell.com revolutionarized PC retail by allowing customers to "build" their own machines online and then have that machine built and on their doorstep in a short turnaround time, in my case I'm talking about a turnaround time of milliseconds not days. It's the same principles Henry Ford based the production line off of which changed every industry. The same concepts behind interchangeable machinable parts

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 1:59 pm
by Christopher
allspiritseve wrote:That's sort of the plan... keep a central codebase (or versions of one) in the repository, that are checked out when the site is updated using svn:externals. That way changes won't ripple out even in the shared code when I make changes, unless I explicitly check them out.
I am a little confused because you seem to be mixing "codebase" and "database" in your comments. Having a central shared database or a separate database for each client application is one converstation. Having a link to central codebase directory, or providing a snapshot of the code base, or a link to a centrally located directories that contain each released version -- that's a different discussion.

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 2:01 pm
by josh
Right, I view a shared code base as a lower tier, a more basic pattern then a multi tenant architecture, which uses the concept of a shared cod ebase but introduces new design requirements. The concept of a multi tenant application takes the concept of "shared" and tries to apply it to everything, one could reason it is a hopeless endeavor. But then again I have been known to try crazy things

Another "formalized" theory I'm alluding to
http://en.wikipedia.org/wiki/Mass_customization

Re: Shared or isolated database?

Posted: Mon Nov 03, 2008 3:07 pm
by allspiritseve
arborint wrote:I am a little confused because you seem to be mixing "codebase" and "database" in your comments. Having a central shared database or a separate database for each client application is one converstation. Having a link to central codebase directory, or providing a snapshot of the code base, or a link to a centrally located directories that contain each released version -- that's a different discussion.
Sorry about that... originally meant this as a database discussion, but it seems to have drifted more towards the code side. Both are design decisions that I'll be making here pretty quick.