Searching the database, designing for speed?

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
robster
Forum Contributor
Posts: 360
Joined: Wed Jul 16, 2003 8:28 am
Location: Sunshine Coast, Australia

Searching the database, designing for speed?

Post by robster »

Hi all,

I'm wanting to make sure my design is solid, so I thought I'd ask for feedback here. It particularly pertains to making it fast to search.

Let's assume I have lots of tables in my MySQL database, they could be like this:

Code: Select all

car_cc
car_tyres
car_cylinders
car_stereo
car_speakers
I then have a single table that contains user info, the user has filled in what their car has, and the user table is similar to this

Code: Select all

user_table
-username
-user_age
-user_gender
-user_cc
-user_tyres
-user_cylinders
-user_stereo
-user_speakers
The idea is, that the user has chosen his/her options on signup. The options were created from the earlier tables shown, so everybody's responces (and hence profile) is taken from, and can be searched from the earlier tables.

So far so good (I hope you're still with me ;)).
The actual question is:

(Q)-Should the 'user_table' contain the actual data from within the other tables, or should the 'user_table' contain links to the id's containing the data in the other tables?

The reason for this question is speed. Let's assume the site is VERY popular and I have say, 10,000 people all doing a search at the same time.

Method 1
If I have the actual DATA in the user_table, the search is as simple as "show me all users vehicles with 4 tyres, a big stereo and that have 6 cylinders'. The search should be fast as it only searches the one table, table 'user_data'.

Method 2
If I have links to the DATA in the OTHER tables, then it become more akin to: Search user_table, check user_table and compare user_tyres against car_tyres, compare user_stereo against car_stereo and compare user_cylinders against car_cylinders.

I'm guessing method 2 would be MUCH more intensive? Is this so? Is the hit on processing power etc dramatically different?

I think I'd rather run on method 2, but I'd like feedback, and I really appreciate anything anyone has to say as it's going to shape my whole site and I'm just about ready to move into this aspect of the site.

I look forward to some good discussion :)

Rob
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

Well, I know that I'm going to be different here than a lot of people. But I would advoate attempting to use "natural keys" as much as possible. This gives you both 1 and 2. The key is descriptive and not some boring un-useful integer. This is only possible if your keys aren'tt going to change.

Some of your "dynamic domain tables", _cc, _cylniders should be easily useable as "natural keys". Ie they would look something like:

Code: Select all

CREATE TABLE car_cc (
  cc_label VARCHAR(20) PRIMARY KEY
);  -- I'm not a car expert so please excuse my stupid values)
INSERT INTO car_cc (cc_label) VALUES ('400 cc');
-- more pre-loads

CREATE TABLE car_cylinders (
  num_cylinders INT PRIMARY KEY
);
INSERT INTO car_cylinders (num_cylinders) VALUES (4);
-- more preloads

CREATE TABLE user_table (
 ...
  cc_label VARCHAR(20) NOT NULL REFERENCES car_cc ON UPDATE CASCADE ON DELETE RESTRICT,
  num_cylinders INT NOT NULL REFERENCES car_cylinders ON UPDATE CASCADE ON DELETE RESTRICT
);
Now you won't need to join to car_cylinders nor to car_cc to still get useful displayable data out. The other tables might not be as easy to find a good "natural key" and therefore might still need a join. However as the query planning tends to be O(n!) its very useful to be able to remove a few tables from the join at least.
User avatar
robster
Forum Contributor
Posts: 360
Joined: Wed Jul 16, 2003 8:28 am
Location: Sunshine Coast, Australia

Post by robster »

dang! :(

A feature I really like about what I'm designing is the ability to change the databases table and the WHOLE site updates. I see what you say about the 'natural keys' but then I lose that capability.

I guess then, and please I'd appreciate a 'yay' or 'nay' on this ;). I guess then, I'll have to do method 2, and compare each tables ID against the table I'm referencing. (ie: compare car_cc to user_cc via their ID as an example).

Thanks again.

Rob
User avatar
robster
Forum Contributor
Posts: 360
Joined: Wed Jul 16, 2003 8:28 am
Location: Sunshine Coast, Australia

Post by robster »

Another thought. If I decided that I REALLY didn't want to change the options... ever! ;) Would it be bad to stick the actual DATA into the fields of the 'user_table' rather than the ID of the table I've taken the data to create the data with?

I know it sort of makes the whole point of the database and its features (ie: cross comparing etc) obsolete. But far out.... it'd be fast.


Is this a dangerous idea?
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

robster wrote:dang! :(

A feature I really like about what I'm designing is the ability to change the databases table and the WHOLE site updates. I see what you say about the 'natural keys' but then I lose that capability.
Hmm? You cna still do a full database update. I beleieve you have to use InnoDB tables in MySQL so you get referential integrity on the foreign keys. (That's the the ON UPDATE CASCADE clause referes to. If the primary key end of a foreign key relationship changes then all the referring keys get changed to match.)
User avatar
robster
Forum Contributor
Posts: 360
Joined: Wed Jul 16, 2003 8:28 am
Location: Sunshine Coast, Australia

Post by robster »

Thank you :)

I'm a bit of a database neub really.

Perhaps I need to learn about InnoBD, about referential integrity, about the CASCADE clause and similar things.

I'm off to hunt for such info, as it seems I might need it, and if anyone has any good tutorials or anything similar on cascade etc... (your own advice would be fine) then I'd appreciate that too.

Thanks again.

rob
Post Reply