Page 1 of 3
MySQL Query Performance - Things to know
Posted: Tue Jul 15, 2008 6:41 pm
by Dynamis
Last Update: 7/24/2008
I know a lot of people have been talking about mysql performance, how efficiency is based around number of rows or columns, and how large columns are. So I did some tests today that I believe show a decent representation for mostly smaller databases, but I'm sure the trend would continue with larger test data.
All results are based on a table with 26 columns and the fastest iteration of 5000 (to achieve near ideal query as possible) (with no query caching)
Code Used:
http://web.ics.purdue.edu/~jdian/files/ ... t_code.txt
Excel Test Results:
http://web.ics.purdue.edu/~jdian/files/mysql_tests.xls
Summarized Results:
- It appears it is best to only pull the data you need, even if you need nearly all the columns. Avoid using wildcard (*).
- Query time linearly increases as number of rows increases.
- Column length does not greatly affect query time.
- As total number of rows increases, column length seems to play a more substantial role though.
- It is faster to pull data using all column names needed UNTIL you are pulling every column, at which point it is fastest to use wildcard(*). (NEW!!)
Running:
PHP: v5.2.6
MySQL: v5.0.51b
Windows XP
Dell Latitude D820 Laptop
For all of the result data, including an extensive graph of performance, click the link at the top of this post.
Future tests to come....
Re: MySQL Query Performance - Things to know
Posted: Tue Jul 15, 2008 8:46 pm
by Benjamin
When you were pulling 10 columns, did you specify the column names manually or use a wildcard?
It would be good to see the code you used.
Re: MySQL Query Performance - Things to know
Posted: Tue Jul 15, 2008 9:01 pm
by Eran
If you were iterating over the same query with the same parameters, after one call it would be cached in the db cache (unless its a giant query) so keep that in mind.
Also it would be nice to get the timings for 1 iteration (1 column vs. 26 columns)
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 8:04 am
by Dynamis
Astions -> When pulling 10 I entered the columns manually, I only used wildcard for pulling all 26.
Pytrin -> I originally started with 1 iteration, but the results were extremely skewed. To get a good read with just 1 iteration, I believe you would need a very Large table to go through. This is why I used an average. Also, I compared 1 iteration to the 10,000 iteration average and they are very close. I believe the pattern is what is important and that it has not been distorted based on this.
I'll try to put the code up later this afternoon.
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 8:11 am
by Eran
If you could put the environment you are using (OS, MySQL and PHP versions) and system configuration it will be great. Thanks for the effort!

Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 8:23 am
by Dynamis
Alright, I updated my post with the code and systems used. Let me know what other information would be helpful, other test results that would be helpful, or any errors I could have made.
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 10:55 am
by dml
Column Length:25 Rows:2500 -- 559.3% faster to pull 1 column than 10, 2704% faster to pull 1 column than 26
So when the amount of data gets big enough (so that presumably the header overhead etc becomes negligible), it takes around N times longer to retrieve N equally-sized columns than it does to retrieve one of those columns. Is it actually the number of bytes rather than the number of columns that's significant? Does the speed halve if the column you select from is twice as long as the rest of the columns? Or if you select two columns instead of one?
As pytrin said, those results are likely to be coming from the cache. Is there an appreciable difference if you switch this off... after you set up the connection, execute "set session query_cache_type=0".
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 11:08 am
by Benjamin
As an aside I imported 8.5 Million records into a MyISAM table last night. One of the fields had a cardinality of 326041 and returned 30 rows ordered by that field within 0.03 seconds on the first query. MySQL is extremely fast when configured properly.
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 11:21 am
by Dynamis
dml wrote:Is it actually the number of bytes rather than the number of columns that's significant? Does the speed halve if the column you select from is twice as long as the rest of the columns? Or if you select two columns instead of one?
If you look at the link with the excel attachment you can see the results. Both are significant, although number of bytes in columns is more significant.
dml wrote:As pytrin said, those results are likely to be coming from the cache. Is there an appreciable difference if you switch this off... after you set up the connection, execute "set session query_cache_type=0".
As I previously stated a few posts up, I compared the avg time of 10,000 iterations w/ the time for 1 iteration and they were in the same ball park. So my results would not vary much by doing that.
astions wrote:As an aside I imported 8.5 Million records into a MyISAM table last night. One of the fields had a cardinality of 326041 and returned 30 rows ordered by that field within 0.03 seconds on the first query. MySQL is extremely fast when configured properly.
I agree, although that is not the purpose of this post. This post is merely to iterate the significance of only pulling from mysql tables what you need and showing the impact of pulling extra data. Even with MySQL properly configured, I would assume these trends are nearly the same. Although it took 0.03 seconds, for you to pull the data, had you pulled a different amount of data, it is my assumption that the time taken would be comparable to the graph of my results.
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 12:08 pm
by dml
You're probably right in your assumption that switching off the query cache won't make much of a difference to the timings in this experiment, but it does entirely change what it is you're measuring. The query cache has the results in a format ready for going on the wire, it doesn't have to hit the disc, and it doesn't have to do any significant CPU work, so what you're measuring is the amount of time it takes to transfer bytes from the memory of the mysql process to the memory of the PHP process.
It's not wrong to measure it that way, as long as you're clear about what it is you're measuring. If somebody put a stopwatch on how long it takes to get a burger and fries at McDonalds, that's a genuine performance measure of customer service at a fast food restaurant, it's just not a measure of how long it takes to cook a burger, since McDonalds eagerly caches burgers, so what the stopwatch has measured is how long it took the clerk to walk over to the cache and fill the order from it.
I can think of three factors that make it better for performance to retrieve only the columns you need, and what you've probably measured is the first factor.
* Reduces the number of bytes transferred between the database and the webserver.
* The result of the query takes up less space in the query cache
* If all of the columns are indexed, there's a possibility of satisfying the query entirely from the indexes.
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 12:37 pm
by Dynamis
As previously stated, my test results for 10,000 iterations are nearly identical to 1 iteration. This lead me to believe query caching is disabled by default, and on MySQL 5 it is.
http://dev.mysql.com/doc/refman/5.0/en/ ... ation.html
So the purpose of the test still pertains. My original post has been updated to include this relevant information.
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 3:18 pm
by califdon
This is a very useful thread and I join others in thanking Dynamis for the effort. Despite the fact that I have never worked on projects where this level of performance was really a consideration, we all should know how these things work, because some day we may really need to fine-tune an application in a "heavy" environment.
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 4:45 pm
by dml
Dynamis, I share califdon's sentiments. Thanks for taking an experimental approach to something that's too often approached on the basis of hearsay. It's in that spirit of being precise and objective that I'd like to harp on again about the query cache.
Two timings being nearly identical don't imply that the query cache is on. Whether the query cache is on is determined by the state of a couple of system variables, and whether a given query came out of the cache can be inferred from the change in state of a couple of status variables. I reworked your test code to do those checks, see below.
My reading of the documentation is that query caching is
on by default, with a limit of
1M.
The 1M limit for the query cache may be significant for the case with a column length of 25 and 2500 rows. It means that the results for no more than 16 columns can fit in the cache (16 * 25 * 2500=1000000). Therefore it's possible that the 1 column case and the 10 column case came from the query cache, but the 26 column case didn't, and in fact that was the case when I ran the test code on my system. So that's a very good reason to only take the columns you're interested in.
Another measure that might be interesting is the number of bytes in the result. Those have to be copied between process spaces, which presumably can be done only so fast, which presumably means that the more columns in the result, the slower to copy. If this was the only thing that determined the timings, one would expect the 26 column query to take 26 times longer than the one-column query. I'm not getting anything like those results: for me it takes around 2.5 times longer.
Test code below, with checks to system status to count query cache hits, etc:
Code: Select all
<?php
/*
* Measuring performance effect of number of cols
* in SELECT query.
* viewtopic.php?f=2&t=85542
*/
ini_set('display_errors', 1);
error_reporting(E_ALL|E_STRICT);
mysql_connect("localhost","root","root") or die(mysql_error());
mysql_select_db("test") or die(mysql_error());
function e($sql){
$result = mysql_query($sql) or die(mysql_error());
return $result;
}
function q($sql){
$result = array();
$resultset = e($sql);
while($row=mysql_fetch_row($resultset)){
$result[]= $row;
}
return $result;
}
function status(){
$result = array();
foreach(q("show status") as $row){
$result[$row[0]]= $row[1];
}
return $result;
}
function vars(){
$result = array();
foreach(q("show variables") as $row){
$result[$row[0]]= $row[1];
}
return $result;
}
define("TABLE", "testtime");
function colname($n){return "f".$n;}
function cols($n){return array_map("colname", range(1, $n));}
function fx($body){return create_function('$x', $body);}
function populate($num_cols, $num_rows, $bytes_per_col){
e("drop table if exists ".TABLE);
e("create table ".TABLE."(".
implode(",\n", array_map(fx("return \$x.' varchar($bytes_per_col)';"),
cols($num_cols))).
")");
$field_values = array_fill(0, $num_cols,
"'".str_pad("", $bytes_per_col, "x")."'");
for($i=0;$i<$num_rows;$i++) {
e("insert into ".TABLE." values (".implode(", ", $field_values).")");
}
}
function t($num_cols, $num_iterations){
$sql = "select ".implode(", ", cols($num_cols))." from ".TABLE;
$total = 0;
for($count = 0;$count<$num_iterations;$count++){
$start_time = gettimeofday(TRUE);
e($sql);
$total+=(gettimeofday(TRUE)-$start_time);
}
return $total/$count;
}
function g($map, $key, $default){
return array_key_exists($key, $map)?$map[$key]:$default;
}
function p(){
$a = func_get_args();
if(php_sapi_name()=="cli"){
echo implode("\t", $a), "\n";
}else{
echo "<tr>";
foreach($a as $td){echo "<td>".$td."</td>";}
echo "</tr>";
}
}
function main(){
if(php_sapi_name()!="cli"){echo "<table border=1>";}
$num_cols = 26;
$col_length = 25;
$num_rows = 2500;
$num_iterations = 20;
$query_cache_type = 'ON'; // set to ON/OFF
p("num cols", $num_cols);
p("col length (bytes)", $col_length);
p("num rows", $num_rows);
populate($num_cols, $num_rows, $col_length);
e("set session query_cache_type = '$query_cache_type'");
$vars = vars();
$start_status = status();
foreach(array(1, 10, $num_cols) as $c){
$timings[$c] = t($c, $num_iterations);
p("avg timing for $c cols(secs):", $timings[$c]);
}
$total_time = array_sum(array_values($timings))*
$num_iterations;
p("$num_cols cols timing/one col timing",
$timings[$num_cols]/$timings[1]);
p("10 cols timing/one col timing", $timings[10]/$timings[1]);
p("$num_cols cols timing/10 col timing",
$timings[$num_cols]/$timings[10]);
$end_status = status();
foreach($end_status as $k=>$v){
if($diff = $end_status[$k] - $start_status[$k]){
$status_diff[$k] = $diff;
}
}
$mb = 1024*1024;
$measured_bytes_sent = g($status_diff, 'Bytes_sent', 0);
$bytes_per_column = $num_rows*$col_length;
foreach(array(
"bytes per column"=>$bytes_per_column,
"bytes per table"=> $bytes_per_column*$num_cols,
"estimated Mb sent"=> $num_iterations*$col_length*
$num_rows*array_sum(array_keys($timings))/$mb,
"measured Mb sent"=> $measured_bytes_sent/$mb,
"total time (secs)"=> $total_time,
"Mb/sec"=>(($measured_bytes_sent/$total_time)/$mb),
"estimated Mb/sec, $num_cols col query"=>((($bytes_per_column*$num_cols)/$timings[$num_cols])/$mb),
"estimated Mb/sec, 10 col query"=>((($bytes_per_column*10)/$timings[10])/$mb),
"estimated Mb/sec, 1 col query"=>((($bytes_per_column)/$timings[1])/$mb),
"query cache type"=>$vars["query_cache_type"],
"query cache limit"=>$vars["query_cache_limit"],
"num cols that could fit in qc (estimated)"=>
$vars["query_cache_limit"]/($num_rows*$col_length),
"query cache hits"=> g($status_diff, 'Qcache_hits', 0),
"num selects"=> g($status_diff, 'Com_select', 0),
) as $k=>$v){
p("$k:", $v);
}
p("");
p("CHANGES IN STATUS VARS:");
foreach($status_diff as $k=>$v){p($k,$v);}
}
main();
?>
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 6:32 pm
by Dynamis
dml wrote:
My reading of the documentation is that query caching is
on by default, with a limit of
1M.
I believe you just read wrong or skimmed to fast. That second url,
Here,states that query cache size is 0 by default, which means query cache is disabled. I pulled the following quote from that url:
"The amount of memory allocated for caching query results. The default value is 0, which disables the query cache."
If you are not running php 5 though, yours might be on by default. Something to check into. Thank you for being critical though, always good to double check and update test results. If time permits, I want to run some more tests this weekend, on a larger set of data perhaps, and maybe turn on query caching to see how it affects run times.
Any suggestions on tests I should perform (this question goes to anyone reading this forum)?
Re: MySQL Query Performance - Things to know
Posted: Wed Jul 16, 2008 7:24 pm
by dml

I stand corrected. I was looking at the entry for query_cache_limit rather than query_cache_size. Query caching is indeed off by default. To be certain it's the case on your system, can you do a 'select variables like 'query_cache%' to verify?