Web "Hosting" design
Moderator: General Moderators
Web "Hosting" design
I'm not exactly doing web hosting, but its the closest analogy to what I'm doing.
I'm on a dedicated server, so I can have any number of databases, etc.
I host lots of events on my my site. Each event currently has a dedicated database with about 80 tables. Over half the tables are small, under 100 rows. A few tables (about 3, right now, expected to grow to 20 as more services come one line) will be several hundred to low thousands rows -- its a relatively small database.
There is a "central" database that helps coordinate administrative actions, user permissions, and tracking event configuration. The latter role has led to most of the "satellite" database's tables being replicated in the central database. It would be hard, but not impossible to push all the sateliite information into the central database. However, more than 90% of the page views do not require a connection to both the central and a satellite database so I'm not taking a preformence hit on the database conenctions.
Does anyone have experience with this type of design to help me weigh the pros/cons?
As I see it:
Central + Satellites
Pros
Small light-weight databases -- fit entirely in RAM
Simpler schema -- don't need to chain a "satellite" identifier though lookups in the satellite database
Less need for sophisticated database tuning
Cons
Some pages need two database connections
Setting up a new event requires creating a new database -- php-user lacks this permission
Replicated information between databases -- however, even in a single database design I would need a way to track/ cache current vs un-commited updates (accounts for maybe 1/3 of the replicated data)
More complicated "db_config.inc"-type file requirements
Single Database
Pros
Only one database to administer, backup, and tune -- easier for others to setup a server based on my software
Less replicated data --2/3 replicated for no "good" reason
Easier to automate new event setup
Cons
More indexing, tuning required to maintain similar levels of preformence
More complex queries
Modifying my existing code to use this method instead of adding new features
I know I'm missing some of the arguements, what else can you think of?
Performence issues aren't really a huge deal, but I like to be aware of them. The dedicated server has 1 GB RAM with 2 GB swap and hasn't had to touch swap space yet (in about 9 months and 10 hosted events), all databases held in RAM.
Thanks
I'm on a dedicated server, so I can have any number of databases, etc.
I host lots of events on my my site. Each event currently has a dedicated database with about 80 tables. Over half the tables are small, under 100 rows. A few tables (about 3, right now, expected to grow to 20 as more services come one line) will be several hundred to low thousands rows -- its a relatively small database.
There is a "central" database that helps coordinate administrative actions, user permissions, and tracking event configuration. The latter role has led to most of the "satellite" database's tables being replicated in the central database. It would be hard, but not impossible to push all the sateliite information into the central database. However, more than 90% of the page views do not require a connection to both the central and a satellite database so I'm not taking a preformence hit on the database conenctions.
Does anyone have experience with this type of design to help me weigh the pros/cons?
As I see it:
Central + Satellites
Pros
Small light-weight databases -- fit entirely in RAM
Simpler schema -- don't need to chain a "satellite" identifier though lookups in the satellite database
Less need for sophisticated database tuning
Cons
Some pages need two database connections
Setting up a new event requires creating a new database -- php-user lacks this permission
Replicated information between databases -- however, even in a single database design I would need a way to track/ cache current vs un-commited updates (accounts for maybe 1/3 of the replicated data)
More complicated "db_config.inc"-type file requirements
Single Database
Pros
Only one database to administer, backup, and tune -- easier for others to setup a server based on my software
Less replicated data --2/3 replicated for no "good" reason
Easier to automate new event setup
Cons
More indexing, tuning required to maintain similar levels of preformence
More complex queries
Modifying my existing code to use this method instead of adding new features
I know I'm missing some of the arguements, what else can you think of?
Performence issues aren't really a huge deal, but I like to be aware of them. The dedicated server has 1 GB RAM with 2 GB swap and hasn't had to touch swap space yet (in about 9 months and 10 hosted events), all databases held in RAM.
Thanks
- massiveone
- Forum Commoner
- Posts: 29
- Joined: Tue Jun 18, 2002 4:39 pm
- Location: Canada
- Contact:
If I understand you right (nielsene) you might like to think about turning it on it's head, what I mean is, have a central dB/server that replicates to your "satellite" servers and then you only have a single place to keep things updated.
Your front-end system will then just talk to your satellite db/servers and the central db/server acts as a spooler.
Load could be an issue in this model, esp. if the databases become quite large, you would have to monitor the traffic on your network to ensure that the replication doesn't get too hungry for resources.
Regards,
Your front-end system will then just talk to your satellite db/servers and the central db/server acts as a spooler.
Load could be an issue in this model, esp. if the databases become quite large, you would have to monitor the traffic on your network to ensure that the replication doesn't get too hungry for resources.
Regards,
That's an interesting idea. I'm already beginning to farm out the satellite databases to seperate hard drives / drive controllers to decrease I/O blocking. If I ever need more load balancing or redundancy that would make sense. However using distributed servers isn't a pro/con in either direction with regards to combining the databases or keeping them seperate. I can replicate the entire master to multiple servers, or could just replicate the satellite databases. So while its a cool, good idea, I don't think it adds weight to either approach.
I do think there is merit in my approach, but in answer to your question I would opt for databases that can fit into RAM (whether they be "central" or "satellite"), but I would only suggest that if the server was dedicated to database operations and running InnoDB.
Asking a server to handle both the OS and any Web operations (PHP/Apache etc) I think would be asking too much, if not near impossible.
Regards,
Asking a server to handle both the OS and any Web operations (PHP/Apache etc) I think would be asking too much, if not near impossible.
Regards,
I agree with everything you say, but I still don't think we're talking about the same problem.
Let me try to explain what I'm asking again. I have multiple databases managed under a single DBMS (PostGreSQL). When the project started each hosted event received a copy of the php source into its web directory (but shared include files) and received a dedicated database -- not a dedicated database mangament system. Throughout the rest of this post when I say "database" I made a collection of relations (tables) managed under a given DBMS. I will specifically say DMBS is I mean the database software product.
I have since been working on the meta-admin tools, basically moving the setup of a new event's site from a system admin task to a "forum admin/web-based" task. In doing so a cohesive filesystem layout was established so that I don't have to replicate source code, etc. The meta-admin tool requires its own database to track which users have permissions in which events and how far along the configuration process each event is. As the configuration tool has become more powerful, more and more of the per-event database schema is replicated in the central database. This is not a master-slave setup.
The question posted in my OP is whether an attempt should be made to "unify" all the per-event databases fully into the central database and only use the one database.
Let me try to explain what I'm asking again. I have multiple databases managed under a single DBMS (PostGreSQL). When the project started each hosted event received a copy of the php source into its web directory (but shared include files) and received a dedicated database -- not a dedicated database mangament system. Throughout the rest of this post when I say "database" I made a collection of relations (tables) managed under a given DBMS. I will specifically say DMBS is I mean the database software product.
I have since been working on the meta-admin tools, basically moving the setup of a new event's site from a system admin task to a "forum admin/web-based" task. In doing so a cohesive filesystem layout was established so that I don't have to replicate source code, etc. The meta-admin tool requires its own database to track which users have permissions in which events and how far along the configuration process each event is. As the configuration tool has become more powerful, more and more of the per-event database schema is replicated in the central database. This is not a master-slave setup.
The question posted in my OP is whether an attempt should be made to "unify" all the per-event databases fully into the central database and only use the one database.
Didn't realise you were running postgres, but that's another matter (see later), I can understand your setup a little better but unsure what you mean by an "event", is an event a single web site ?
If so, I would go for a seperate db approach, other wise your db schema to separate "events" (in a single db) could become too cumbersom to manage effectively, unless you can guarantee that the "events" will generate low amounts of data.
There are also issues (in my experience) with postgres when single databases get too large. When the vacuum process runs it gets a little stuck when databases are above 25 gig. Now I realise this is a lot of data and the "events" may not generate that much but it is a concern that should be noted when using postgres.
I do understand your scenario, but I feel a master/slave model would fit your system, albeit a pain to architect in a production (with active events) environment.
Am I helping
Regards,
If so, I would go for a seperate db approach, other wise your db schema to separate "events" (in a single db) could become too cumbersom to manage effectively, unless you can guarantee that the "events" will generate low amounts of data.
There are also issues (in my experience) with postgres when single databases get too large. When the vacuum process runs it gets a little stuck when databases are above 25 gig. Now I realise this is a lot of data and the "events" may not generate that much but it is a concern that should be noted when using postgres.
I do understand your scenario, but I feel a master/slave model would fit your system, albeit a pain to architect in a production (with active events) environment.
Am I helping
Regards,
Yup you're helping. I didn't deal with events in the OP, because I though that the analogy with web-hosting would be more clear. Guess I was very wrong.
In this application, an event is a ballroom dance competition. The application generates web pages describing the event, and handles on-line registration, scheduling, invoicing, etc. The application as a whole is named "CompInaBox" and at any given time I'll have multiple events open for registration and multple events under configuration.
For reference a my largest event's database is a paltry 5.4 MB
(SELECT SUM(relpages)*8 as "DB Size in KB" from pg_class;) and I'ld expect 10-30 events a year. So I don't think I'll hit the 25 GB problem you mentione anytime soon
This is why I felt the data amount was low enough to possibly justify the combination into a single database. With the main advantage of not neeing to use a cron-job or sudo'd task to create the new databases.... (Which is currently the only new event task that I can't do securely and safely though the web interface I've been writing.)
Thanks for bearing with my poor descriptions.
In this application, an event is a ballroom dance competition. The application generates web pages describing the event, and handles on-line registration, scheduling, invoicing, etc. The application as a whole is named "CompInaBox" and at any given time I'll have multiple events open for registration and multple events under configuration.
For reference a my largest event's database is a paltry 5.4 MB
(SELECT SUM(relpages)*8 as "DB Size in KB" from pg_class;) and I'ld expect 10-30 events a year. So I don't think I'll hit the 25 GB problem you mentione anytime soon
This is why I felt the data amount was low enough to possibly justify the combination into a single database. With the main advantage of not neeing to use a cron-job or sudo'd task to create the new databases.... (Which is currently the only new event task that I can't do securely and safely though the web interface I've been writing.)
Thanks for bearing with my poor descriptions.
Now that makes more sense
It's fine, I was probably reading between the lines...
You NEED a single database with a good schema to separate your "events", such as:
eventname_tablename
Your queries will be simpler too (knid of), you will be able to do joins and the like across events.
The real title for this post should be "Table Naming Conventions"
From what I can gather you will have a big task implementing a logical schema that will allow simple seperation of the tables.
Regards,
You NEED a single database with a good schema to separate your "events", such as:
eventname_tablename
Your queries will be simpler too (knid of), you will be able to do joins and the like across events.
The real title for this post should be "Table Naming Conventions"
From what I can gather you will have a big task implementing a logical schema that will allow simple seperation of the tables.
Regards,
Why do you feel that the single database is the correct choice in this case? Cross database joins aren't needed. If I were to go to a single database design I would definately not use an eventname_ prefix before all the event specific tables. That breaks the Information Principle. (All information is stored in relations. No meaning is "hidden" from the database. While the table name is stored in the catalog, its not directly accessible to the user and the predicate of the table is not as complete as it could be.)
Depending on the keying of the tables I would either end up with a composite key of <eventid,{old primary key}> or make a new surrogate key. My eventids are a Unix-shell friendly names for the event, that are used in urls and as the current break-out database names. (I strive to use "natural keys" whenever possible.)
Depending on the keying of the tables I would either end up with a composite key of <eventid,{old primary key}> or make a new surrogate key. My eventids are a Unix-shell friendly names for the event, that are used in urls and as the current break-out database names. (I strive to use "natural keys" whenever possible.)
Agreed, but I was under the impression (reading between the lines again!) that you needed the separation. I would most definitley agree that you need to have a schema that implements proper rationalisation, which is the whole point of RDBMS 
Because invariably database names give the separation of projects/products/services via their names, I was implementing this anology in separation of your tables/events.
Having a schema that provides you with a few tables to handle all of your events is more of a reason/case for a single database, and will be easier to maintain and rationalise. The obvious other benefit would be a super-simple dbconfig
The only case I would see for a multi-db aproach (in this instance) is if each event requires a vastly different table definition.
BTW, this post is helping my status in the forum, cheers, newbie to scripter in a couple of day, can't be too bad
Regards,
Because invariably database names give the separation of projects/products/services via their names, I was implementing this anology in separation of your tables/events.
Having a schema that provides you with a few tables to handle all of your events is more of a reason/case for a single database, and will be easier to maintain and rationalise. The obvious other benefit would be a super-simple dbconfig
The only case I would see for a multi-db aproach (in this instance) is if each event requires a vastly different table definition.
BTW, this post is helping my status in the forum, cheers, newbie to scripter in a couple of day, can't be too bad
Regards,