Sure, but remember that indexes hardly ever work in isolation in the real world. When you have a number of indexes that can be used in a query (which may come from ON, HAVING, ORDER BY, GROUP BY, and nearly any clause of the query) the order in which the indexes are used is very relevant to the performance of the query. It is not the same to inspect 10,000 out of 100,000 rows (because we have an index on, say status_id) than to inspect every single row because we are indexing a column with high cardinality (such as the user's full name).
MySQL knows this and thus will try to use the smallest possible candidate index first. That way, right off the bat, we may eliminate 3/4 of the records we have to look through.
Furthermore, index lookup isn't a free operation either, and traversing an index with 100,000 entries will take a lot longer than traversing one with 4 entries. In some extreme cases it may even take as long as a full table scan (if the number of rows in the table equals the number of entries in the index), in which case the index is nearly useless or even counterproductive. Using an index also carries the additional overhead of retrieving the rows that the index points to. Luckily, MySQL knows this too, and will simply not use the index if it believes it will be as costly as doing a full table scan (ie: when the cardinality of the index is the same as the table)
Will number of rows degrade performance?
Moderator: General Moderators
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Will number of rows degrade performance?
Actually I don't think that is true. Traversing the btree is much faster than scanning rows. According to the MySQL documentation:
http://dev.mysql.com/doc/refman/5.0/en/ ... dexes.htmlTo eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows.
(#10850)
Re: Will number of rows degrade performance?
I'm afraid you are right.
I only quickly scanned the link that Califdon posted (which deserves a closer look some other day), but I probably got confused reading some of the information there. I think that this is the part that stuck with me:
(talking about a pseudo-boolean column) "But the fact that it’s a very small value is important for some queries. For example, select sum(id) from something will scan the is_something index because it’s the smallest available. "
"Sum", of course, is the keyword that makes all the difference in this query. So mySQL will try use a small index first, but only for aggregate queries. For all other queries it is likely to be the opposite, like the documentation explains.
I only quickly scanned the link that Califdon posted (which deserves a closer look some other day), but I probably got confused reading some of the information there. I think that this is the part that stuck with me:
(talking about a pseudo-boolean column) "But the fact that it’s a very small value is important for some queries. For example, select sum(id) from something will scan the is_something index because it’s the smallest available. "
"Sum", of course, is the keyword that makes all the difference in this query. So mySQL will try use a small index first, but only for aggregate queries. For all other queries it is likely to be the opposite, like the documentation explains.
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Will number of rows degrade performance?
Which is why the only general performance tips are to use indexes and return the fewest rows possible. Otherwise you need to experiment to see what actually works. And if you upgrade the optimizer may change. 
(#10850)
Re: Will number of rows degrade performance?
I wholeheartedly agree. There are too many variables in the equation of SQL performance to be able to give any general tips: RDBMS vendor, query caches, engine type, table size, index type, query syntax, prepared statements, server hardware... all of these can weigh just as heavily on your particular strategy.arborint wrote:Which is why the only general performance tips are to use indexes and return the fewest rows possible. Otherwise you need to experiment to see what actually works.