Page 2 of 3

Re: MySQL Query Performance - Things to know

Posted: Wed Jul 16, 2008 7:38 pm
by Dynamis
dml wrote::) I stand corrected. I was looking at the entry for query_cache_limit rather than query_cache_size. Query caching is indeed off by default. To be certain it's the case on your system, can you do a 'select variables like 'query_cache%' to verify?
I checked. It is officially off! :) Thanks.

Re: MySQL Query Performance - Things to know

Posted: Wed Jul 16, 2008 9:01 pm
by dml
I changed the code to print out the timing on each iteration, and I put the output through gnuplot. Each point on the plot is the timing for an iteration with the number of cols on the x axis and the timing on the y axis. The table is the same always: 2500 rows, column length 25. I did 40 iterations for each of 1..26 columns. The plot suggests a straight line relationship between the number of columns and the timing. I believe there's a way of calculating the closest straight line from plotted data of this kind, so that you could turn it into a formula like timing = a+b*num_cols. If anyone knows how to do that, those would be interesting figures to have.

The other plot has the query cache turned on. It's the same straight line, except that from 1..15 columns there's a straight line underneath the main one. This is the queries being served from the cache.

Re: MySQL Query Performance - Things to know

Posted: Wed Jul 16, 2008 9:07 pm
by Dynamis
dml, you have aim or msn (IM me on here if you do)? I would like to coordinate a big test w/ you that we could graph and come up w/ an equation, but we need to come up w/ a good way of doing it. Do it right the first time.

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 12:21 pm
by Dynamis
So I am trying to write an all-in-one program that will take care of all the testing for me and save the results to a file, but I think I've run into an issue. The first query each time after populating the table takes longer than the rest on the same population, even if it is trying to grab the smallest amount of data/columns. This leads me to believe that mysql_query simply sends the query to mysql and returns true/false if mysql accepts it and it does not wait for the query to finish. So for instance, I call mysql_query to populate the table and then call mysql_query to select from the table. I believe the select query is called while the database is still being populated. Anyone know exactly how the mysql_query function works and if this is what is occurring. This situation occurs every first select statement in my loop, even if the other select statements are not the same (not in query cache), plus I double checked and query cache is not on. Ideas?

EDIT
Ok, how to measure this? I've noticed MySQL speeds up for about the first 1000-5000 iterations, and then after that all the queries are close. And when I say speed up, pretty substantially. First iteration: .0004 ish 3000th iteration: .0002 , and around .00019 from then on, for this query I'm running. I do not have caching on, again, double checked, someone know how to explain? Also, with this being the case and if no one has a reasonable explanation, what would you say is the best way to measure the data?
  • Using the lowest of 10,000 iterations isn't the most realistic, but would give the best graph data and equation slope (I would assume).
  • Using average of say... first 10 iterations could be pretty inaccurate if any spikes occur. And it seems it just speeds up straight to the 1000th+ iteration range. So anything before 1000 or 3000 is like... a ticker just speeding up.
Anyway, the best solution would be to figure out why it speeds up for the first few thousand iterations. Any help or knowledge on this subject would be appreciated.

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 2:11 pm
by Eran
Ok, how to measure this? I've noticed MySQL speeds up for about the first 1000-5000 iterations, and then after that all the queries are close. And when I say speed up, pretty substantially. First iteration: .0004 ish 3000th iteration: .0002 , and around .00019 from then on, for this query I'm running. I do not have caching on, again, double checked, someone know how to explain? Also, with this being the case and if no one has a reasonable explanation, what would you say is the best way to measure the data?
You have caching in your CPU and in your OS well. Nothing much you could do about that.

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 2:21 pm
by Dynamis
Ok, I'm thinking about taking the average of the 500 fastest, I think this will be a pretty good average.

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 2:37 pm
by Eran
Don't take just the "good" results, that's not proper testing methodology. Use all the results and try to think why they are what they are - it's a good opportunity to better understand the process. Also, don't forget to do error estimation and calculation and show that in your results.

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 3:03 pm
by Dynamis
pytrin wrote:Don't take just the "good" results, that's not proper testing methodology. Use all the results and try to think why they are what they are - it's a good opportunity to better understand the process. Also, don't forget to do error estimation and calculation and show that in your results.
There are thousands of variables there (all your server hardware, etc.). My goal is to merely show the relationship between number of rows, columns, column size, and number of columns being pulled. So based on this, what would you say would be the ideal set of iterations to use? I am leaning toward using the very fastest of the 5000 iterations since it would be the closest to representing a perfect system (although thats not possible obviously). Still debating and taking suggestions.

EDIT
I have decided to use the fastest iteration of 5000. I believe this makes the most sense because it will be the closest to the actual time of the query, with as many variables stripped as possible. I will explain more in my results.

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 3:05 pm
by Benjamin
I'd like to see some tests that indicate whether adding "LIMIT 1" to the end of a query increases speed when you're only pulling 1 record from a (large) dataset.

For instance:

Code: Select all

 
SELECT foo FROM TABLE WHERE user_id = 10 LIMIT 1
 
There is only 1 user with a user_id of 10, because it's a primary key. In this case would adding limit 1 increase performance?

What if user_id was not a primary key, but it was indexed and only one user with the id of 10 existed?
What if user_id was not indexed and a table scan needed to be performed? (I think in this case adding LIMIT 1 would definitely increase performance)

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 3:17 pm
by s.dot
astions wrote:I'd like to see some tests that indicate whether adding "LIMIT 1" to the end of a query increases speed when you're only pulling 1 record from a (large) dataset.

For instance:

Code: Select all

 
SELECT foo FROM TABLE WHERE user_id = 10 LIMIT 1
 
There is only 1 user with a user_id of 10, because it's a primary key. In this case would adding limit 1 increase performance?

What if user_id was not a primary key, but it was indexed and only one user with the id of 10 existed?
What if user_id was not indexed and a table scan needed to be performed? (I think in this case adding LIMIT 1 would definitely increase performance)
I've had these thoughts many times. Never bothered to research it. I always add LIMIT 1 if I'm expecting one record (regardless if my where field is a primary key or a unique index).. out of habbit.

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 3:30 pm
by Eran
There are thousands of variables there (all your server hardware, etc.). My goal is to merely show the relationship between number of rows, columns, column size, and number of columns being pulled. So based on this, what would you say would be the ideal set of iterations to use?
There aren't as many variables that affect the test results (for example, your network card and graphic adapter aren't relevant). You will be making some assumptions naturally (and it's important to think what assumptions you are making and document them), however I think the reason the queries start out slower is relevant and you should try to get to the bottom of it. Maybe ask around the MySQL performance blog forums - http://forum.mysqlperformanceblog.com/ or other similar places.

Also, try maybe to 'sleep' your script between iterations and see how the results vary.

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 3:38 pm
by Dynamis
pytrin wrote:There aren't as many variables that affect the test results (for example, your network card and graphic adapter aren't relevant). You will be making some assumptions naturally (and it's important to think what assumptions you are making and document them), however I think the reason the queries start out slower is relevant and you should try to get to the bottom of it. Maybe ask around the MySQL performance blog forums - http://forum.mysqlperformanceblog.com/ or other similar places.

Also, try maybe to 'sleep' your script between iterations and see how the results vary.
I tried sleep and the results were the same, even sleeping for long periods of time in between. As for variables, you have to think of everything else running on the computer affecting the performance of the MySQL server, your computers cache, processor speed, etc. Anyway, my test isn't really based on any of that, so I will not be including it. I just want to get a basic relationship and assumptions that would be useful for anyone. For this I don't believe I need to understand why the queries speed up for the first 1000 in the loop, or so, the trend in that time should be the same as the trend for the fastest.

Re: MySQL Query Performance - Things to know

Posted: Thu Jul 17, 2008 3:55 pm
by Benjamin
scottayy wrote:I always add LIMIT 1 if I'm expecting one record.
Yeah I do as well. I just type it without giving it a second thought. I've been told that there is no benefit though.

Re: MySQL Query Performance - Things to know

Posted: Fri Jul 18, 2008 10:19 am
by Dynamis
The new test results are up. Enjoy!

Re: MySQL Query Performance - Things to know

Posted: Fri Jul 18, 2008 1:23 pm
by dml
The indication that column size might be less costly than number of rows and columns is interesting. The numbers are a bit small to draw conclusions yet (at 1300 rows, "select a from t" will select 1300 bytes when column length=1 and 33800 bytes when column length is 26), but it would make sense for each row and column processed to have a cost, so that it's quicker to select ten 1000 byte cells than a thousand ten byte cells?

The number of bytes, ie num rows*num cols*col length, has to determine a minimum timing, so that if you have N bytes, the timing can be no less than kN. Here's a plot generated from your data, number of bytes on x, timing on y:
Dynamis's results, bytes in result on x axis, timing on y axis
Dynamis's results, bytes in result on x axis, timing on y axis
bytesandtimings.png (7.27 KiB) Viewed 946 times

All of the timings are indeed above some k*number of bytes, but there's evidently a lot more determining the timings than the mere number of bytes: sometimes it takes longer to retrieve 30K than it does to retrieve 300K.

This technique of taking the minimum of 5000 measurements is an interesting one. If I understand correctly, it's like taking a runner's personal best as more representative of their talent than their average time, because that says something about their inherent capabilities, and eliminates the cases where the weather was bad, where they got blocked by other runners, etc.

I'd like to see the other 4999 measurements somehow used in the plots, though, instead of being thrown away. With that many measurements, it would presumably be a blob with one side of it being well defined by the inherent constraints of the task (and the minimum timings being points along that edge), and the other side being ragged and trailing off into a cloud of outliers and anomalies. If it turns out to be two blobs, or if the minimum timings aren't contiguous with the body of the blob, then there might be something else going on.