Need some MySQL advice?!
Posted: Mon Dec 08, 2003 10:36 pm
I have a survey script that puts the results into a MySQL table. There are 107 questions on the survey with different datatypes, so therefore I have a table with 107 columns for the results to go into each time somebody takes the survey.
Now, another man I work with thinks I should just have like 3 columns, one for the question # and one for the answer. So instead of having one record for one person with 107 columns, have 107 records with only 3 columns. He somehow thinks that there won't be any speed difference doing it this way, but I disagree and think it would be much slower to select 107 records, when you just select one with 107 fields.
This is what I told him in an email.
"Because there are over 100 questions in that survey, do you think it would
be faster that way? Because everytime someone takes the survey, like 107
records would be created. And I think MySQL will insert columns faster than it will records (I'll have to check into that). But also, for our survey
stats/reports that I make .. selecting over 100 records just for one survey
will take ages. Imagine if 100 students take this survey, and Jody wants to
see a report of the averages for all, that means my query has to select over 10, 000 records."
And this is what he had to say about it:
"I doubt if there's a significant difference in the data retrieval time --
since the data is normally cached (unless the system runs out of cache
space) all it has to do is the very fast logical/calculational operation to
find where the data is located and go to the data locations in RAM/cache and
get it.
The advantage of the vertical structure rather than the horizontal is
conceptual clarity and flexibilty. That is, if next year there 150
questions, the tables don't need to be modified, or if there are only 75,
again there's no need to modify tables and no waste space if they're not
modified. If additional questions are added, the new set can be handled
without table modification. Generally, the "wide/flat" structure is
inferior to the "tall" structure."
So is he right about all that or am I? Just need some input from some other MySQL geeks on this debate. Of course, if I go his route, I would have to make the 3 columns of type BLOB or something to be able to handle any of the data it stores, as with my way, each field is its own datatype to match the survey question. And I think I read in the MySQL manual that you using binary datatypes for fields is much slower?
Now, another man I work with thinks I should just have like 3 columns, one for the question # and one for the answer. So instead of having one record for one person with 107 columns, have 107 records with only 3 columns. He somehow thinks that there won't be any speed difference doing it this way, but I disagree and think it would be much slower to select 107 records, when you just select one with 107 fields.
This is what I told him in an email.
"Because there are over 100 questions in that survey, do you think it would
be faster that way? Because everytime someone takes the survey, like 107
records would be created. And I think MySQL will insert columns faster than it will records (I'll have to check into that). But also, for our survey
stats/reports that I make .. selecting over 100 records just for one survey
will take ages. Imagine if 100 students take this survey, and Jody wants to
see a report of the averages for all, that means my query has to select over 10, 000 records."
And this is what he had to say about it:
"I doubt if there's a significant difference in the data retrieval time --
since the data is normally cached (unless the system runs out of cache
space) all it has to do is the very fast logical/calculational operation to
find where the data is located and go to the data locations in RAM/cache and
get it.
The advantage of the vertical structure rather than the horizontal is
conceptual clarity and flexibilty. That is, if next year there 150
questions, the tables don't need to be modified, or if there are only 75,
again there's no need to modify tables and no waste space if they're not
modified. If additional questions are added, the new set can be handled
without table modification. Generally, the "wide/flat" structure is
inferior to the "tall" structure."
So is he right about all that or am I? Just need some input from some other MySQL geeks on this debate. Of course, if I go his route, I would have to make the 3 columns of type BLOB or something to be able to handle any of the data it stores, as with my way, each field is its own datatype to match the survey question. And I think I read in the MySQL manual that you using binary datatypes for fields is much slower?