Performance Question based on Table Count

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
MrPotatoes
Forum Regular
Posts: 617
Joined: Wed May 24, 2006 6:42 am

Performance Question based on Table Count

Post by MrPotatoes »

personally i prefer everything to be small, compact and dead simple. anything that is in my code that doesn't really need to be in there, slows stuff down or just makes it complicated either gets re-written or dumped.

well in light of that i have a serious question about database tables.

does speed decrease with higher count of tables? i think taking tables that can be concatenated should be and should increase speed but now i don't know. i was thinking about it because that would decrease the amount of quieries but that isn't actuallyt he truth. you might still need two anyways.

secondly, if i have two tables and i make one quiery calling two different tables is that like making two queries anyways?

thanx for yout heoretical help :)
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

That depends entirely on your database structure.
GM
Forum Contributor
Posts: 365
Joined: Wed Apr 26, 2006 4:19 am
Location: Italy

Post by GM »

Here you have hit a grey area in relational database theory.

From a certain point of view, it can be useful to have some data redundancy. From another point of view, this goes against normalisation rules.

Looking at things from the maintenance side, it is better to put everything into its own table. Keeps everything neat and tidy, and any changes need to be done only in one place. Also, the database is more flexible like this. From a performance point of view, you lose a little bit, because in this case, in order to get "meaningful" data out (ie: data that the end user can read and understand), you need to create more joins.

Look at this database design:

Code: Select all

CREATE TABLE cd_songs (
id_cd INT(5) NOT NULL,
song_position INT(2) NOT NULL,
song_name VARCHAR(50) NOT NULL,
PRIMARY KEY (id_cd, id_song));
and this one:

Code: Select all

CREATE TABLE cd_songs (
id_cd INT(5) NOT NULL,
song_position INT(2) NOT NULL,
id_song INT(5) NOT NULL,
PRIMARY KEY (id_cd, song_position));

CREATE TABLE songs (
id_song INT(5) NOT NULL AUTO_INCREMENT,
song_name VARCHAR(50) NOT NULL,
PRIMARY KEY (id_song));
According to normalisation rules, the first design is not as good as the second, because you are storing the song name in the cd_songs table, and in theory the same song can appear on different cds (therefore there could be some redundancy of data). In the second design, we are better normalised, but we need another join to get meaningful data out to the user.

Personally, in cases like this, you need to use a bit of common sense: How many times am I going to have the same song in my collection? 5 times maximum? Either way, does it really warrant another table?

In this case, I think you'd be excused for "breaking" the rules of normalisation.
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Re: Performance Question based on Table Count

Post by Roja »

MrPotatoes wrote:does speed decrease with higher count of tables? i think taking tables that can be concatenated should be and should increase speed but now i don't know.
The general rule is "The more columns in a table, the slower the select will be."

Imagine doing a select * on a table that has 2 columns. Then imagine doing the same on a table that has 100. Which do you think will be slower? Test and you will confirm - more columns = slower.

However, there are a tremendous number of exceptions to that very general rule. If you write narrow queries, the difference can be reduced. If you use a solid query cache setting on some databases, the difference can be reduced. If you use application caching, the difference can be reduced.

If you have to do selects against one table (that could fit in memory), instead of against 2 different tables (that in total couldn't fit in memory), more columns in one wouldn't be bad.

Anytime you talk performance, you should be testing. And not in abstract terms either. With performance, you are trading design choices for a gain in something. What gain you get, and how important that gain is is important!

For example, if I said, you could triple your performance if you made a new seperate table for every row of data you had. How much extra work would it be? How hard would it be to code hundreds of SQL calls to use different table names?

What is triple the performance? In memory? In CPU speed? In space on the drive? For just those queries? What if triple the performance only saves 1/1000th of a second, and you need to save 2 seconds?

Performance discussions should begin and end with a clear definition of your goals, and benchmarking.
Post Reply