When to split or merge tables

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

When to split or merge tables

Post by matthijs »

I've been working on a db design and after normalization ended up with (among others) these two tables:

Code: Select all

CREATE TABLE `mc_routeclimbs` (
  `climb_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `route_id` INT UNSIGNED NOT NULL,  # FK
  `user_id` INT UNSIGNED NOT NULL,  # FK
  `climb_date` DATE NOT NULL,
  `climb_style` ENUM('OS','FL','RP') NOT NULL,
  `climb_tries` SMALLINT UNSIGNED NOT NULL DEFAULT '1',
  `climb_notes` TEXT NOT NULL,
  `climb_dateposted` DATETIME NOT NULL,
PRIMARY KEY (`climb_id`)
)

CREATE TABLE `mc_boulderclimbs` (
  `bclimb_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `boulder_id` INT UNSIGNED NOT NULL , #FK
  `user_id` INT UNSIGNED NOT NULL, #FK
  `bclimb_date` DATE NOT NULL,
  `bclimb_style` ENUM('OS','FL','RP') NOT NULL,
  `bclimb_tries` SMALLINT UNSIGNED NOT NULL DEFAULT '1',
  `bclimb_notes` TEXT NOT NULL,
  `climb_dateposted` DATETIME NOT NULL,
PRIMARY KEY (`bclimb_id`)
)
As you can see both contain the same columns. My question concerns whether that's a good thing. I split the data up in 2 tables because in the application most often there will be queries for either of these two. Most often there will be a SELECT, UPDATE or INSERT query for one or another. So even though "routeclimbs" and "boulderclimbs" share the same columns, they are sort of different things.

To compare it to something else, think about records and cds. Both can be described with the same columns, but are different things. People would be looking for cd's or records, not both at the same time.

However, if I would just merge them in one table and use 1 field `climb_kind` ENUM ('route','boulder') it would eliminate a lot of duplication. Both in the database and in the php logic. I can imagine I can reuse all my queries for one table and just add a WHERE climb_kind = '$var' clause. If I were to use the 2-table design I would have to duplicate all queries (as they are in fact the same for both tables).

So am I right to think that I can better merge these tables in one?

(at first another reason for splitting them up was performance. But as I've found out that will probably not be an issue if the queries are written well and the tables indexed)
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

I would keep them separated.. But i don't see why you would have to duplicate the queries though..

The only difference is the table name, thus write code that uses a variable for the tablename...
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Thanks Tim. If I may ask, why would you keep them seperated? (sorry, I always want to know the why)

As for the queries: yes I could probably use variables for the tablename. But how I would implement that is another question.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

matthijs wrote:If I may ask, why would you keep them seperated? (sorry, I always want to know the why)
I prefer to keep things simple. If i store different entities in the same table i would need to add some logic (usually a column with a typecode) to differentiate between the entity types i feel that things become more complicated than they have to be.
matthijs wrote: As for the queries: yes I could probably use variables for the tablename. But how I would implement that is another question.
It depends on how you implement your data acess logic... (Here's what im thinking right now:)

In this case i would probably end up with a AbstractDac class that implements simply Insert, Delete, Update and Retrieve methods.. This class would also have an abstract function getTableName().

Then i would implement two concrete Dac class that extend the AbstractDac and add the entity-specific methods to them...
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

timvw wrote:I prefer to keep things simple.
Funny that you use this argument, as my argument for merging the 2 tables was also to make things more simple :wink:

However, I do see your point. And your idea for an abstract class which is implemented by two concrete classes makes sense.

Then I'll keep it with the 2 tables for now. Thanks for your insight.
User avatar
califdon
Jack of Zircons
Posts: 4484
Joined: Thu Nov 09, 2006 8:30 pm
Location: California, USA

Post by califdon »

I'll begin by confessing that I'm a retired database consultant and college instructor, so if I seem pedantic, that's why.

Relational database design is a pretty advanced field governed by rather strict rules. It's true that there is room for deviation from the rules and experienced database designers might disagree at times, but you would be well advised to start from a firm foundation, rather than a "feels good" approach.

timvw's reference to entities is important, but let me try to define what that means. An entity is a person or a place or a thing or an event, etc. Every distinct entity should be reflected in the database model as a table. Entities have attributes (properties) that become the fields in each entity's table.

In my mind, the first question to ask is: Are routeclimbs and boulderclimbs expressions of the same entity or are they distinct entities? As is often the case, the answer may not be straightforward. A clue lies in the fact that they have the same attributes (fields). Thus, my first inclination is to say that they are really both examples of the entity climbs. If they are, I would merge them and add a 'TypeOf' field to distinguish which are which, if that is required. That's not necessarily the best answer, because your knowledge of the real world that you're modeling might lead you to a different conclusion. My point, then, isn't to tell you what the best schema is, but to urge you to approach it by thinking about the entities.

Look, if this is all that's going to be in your database--forever and ever--it really won't matter very much which way you go. Your code will be slightly different, but it's not a big deal. However, if you later want to expand the application--perhaps add other tables, link it to another application, whatever--the schema that you choose now will likely influence how well you can do that.

I just answered another poster in this forum and provided the following links to a few relational database tutorials that I recommend. I'll repeat them here:

http://www.surfermall.com/relational/lesson_1.htm
http://www.utexas.edu/its/windows/datab ... rview.html
http://www.extropia.com/tutorials/sql/toc.html
http://www.webmonkey.com/webmonkey/99/13/index1a.html
http://www.databasejournal.com/sqletc/a ... hp/1469521

In the decades during which I designed commercial databases and taught database courses, I've seen a whole lot of people innocently jump into database design without benefit of some guidance in the fundamentals of relational database theory, and they all paid for it later, often by hiring me to step in and redesign their databases.
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Califdon, thanks for your input. Appreciated.

Your thoughts are exactly what has been bothering me while designing the db. The thing is - indeed - that routeclimbs and boulderclimbs are not completely distinct entities. They are both examples of climbs. And do have the same characters (reflected in the same table fields).

I don't know how many people climb here, but to explain a bit: a boulder is a short route on a not-so-high rock (boulder). So when you go out bouldering, that means you take a little matras (called a crashpad) with you, place it below the small rock and climb the few meters (6-12 feet) high rock. Without the safety of a rope. The other kind of sport/rock climbing is on routes on higher rock (10-30m, 30-100ft), where you do use a rope to keep you from falling too far.

Now the app I want to build is one in which you can keep records of what you climbed. So that means you keep track of boulders and routes. The thing is, because boulders and routes are different kind of climbs (short and powerful vs long and sustained) a user will want to work with separate lists of both. It's like keeping track of your running times on the 400m and 3000m runs. It wouldn't make sense to show you one list with both mixed up. So in that sense, routeclimbs and boulderclimbs are different things.

However, I'm almost certain the two will always have the same characteristics (name, grade, area, etc). So in that sense they are expressions of the same entity and it would make sense to put them in one table and add a TypeOf field. In the running example that would mean a TypeOf field with values "400", "800", "1500" etc.

I'll follow the links you gave and see if that offers some more insight. At the moment I think I should keep them in separate tables. As I cannot predict how the app will extend in the future I'd better keep it as normalized as possible.

Thanks again for all input guys.
Post Reply