Page 1 of 2

Will number of rows degrade performance?

Posted: Tue Jul 29, 2008 2:15 am
by Stryks
I have a task I'm trying to achieve, but I'm having trouble with the most efficient way of doing it.

I have a main table for core items, and another which holds the components that make up the item. Eventually, we'd be looking at say, 20,000 items with 6 - 8 components each, meaning 160,000 or so records in that table.

I also have a system of letting users create their own items, using one of the inbuilt items as a guide. This data is kept in an equivalents table, which stored what the custom item is based on, what has been added, and what has been removed. A price is generated for custom items by using the base price of the inbuilt item + cost of added components.

But if a new inbuilt item is added or the price changes, I need to be able to pull out all the custom items, determine which base item is the best match, what additions and removals are needed, and what the new price would be. This part is pretty much complete though.

My issue is, when it comes time to work all this out, I need to know what components were in the custom item. At the moment I have to call the base item and it's components, then adjust it using the equivalents data, and then try and do a match.

Basically, I'm wondering if my best bet will be to keep doing this, or just to write the custom ingredients back to the ingredients table where they can be accessed easily. With this few rows of data, I am thinking that I probably wont take too much of a performance hit.

Any advice?

Thanks for reading by the way. I know it's a slog. 8)

Re: Will number of rows degrade performance?

Posted: Tue Jul 29, 2008 7:48 am
by Dynamis
Here is a MySQL Performance Study I conducted a week or so ago. It should help you get the answers you are looking for.

Re: Will number of rows degrade performance?

Posted: Tue Jul 29, 2008 9:31 am
by Stryks
Interesting stuff.

One thing I'm not clear on is the definition of
Query time linearly increases as number of rows increases.
Is that the number of rows returned, or the number of rows in the table? It appears to be rows pulled.

Would pulling the same number of cols / rows from a table with 20,000 rows total be faster than the same from a table with 2,000,000 rows total?

Re: Will number of rows degrade performance?

Posted: Tue Jul 29, 2008 9:36 am
by Dynamis
Stryks wrote:Is that the number of rows returned, or the number of rows in the table? It appears to be rows pulled.
That is number of rows pulled, you are correct.
Stryks wrote:Would pulling the same number of cols / rows from a table with 20,000 rows total be faster than the same from a table with 2,000,000 rows total?
Yes. That is where that linear equation came into play. Assuming the trend would continue, you could extend that line to 2,000,000 rows to see the time it would take.

Re: Will number of rows degrade performance?

Posted: Wed Jul 30, 2008 6:05 am
by Stryks
I've been having a think about it, and I'm wondering if your tests actually capture what I am talking about.

I mean, fair call, pulling x columns from varying amounts of rows is going to tell you query time to pull that data from all rows. But I wonder if performance is increased in cases where you are returning just a section of the dataset. Say, you just pull 1 column for 20 rows of each size set. Not with limit, but with criteria.

Actually, you could then get an idea of where clause speed on indexed vs. non-indexed columns.

Also, just wondering what type of tables you used, and if that would be a performance factor. I work with groups of data mainly, so I use InnoDB for the transactions.

Cheers

Re: Will number of rows degrade performance?

Posted: Wed Jul 30, 2008 8:06 am
by Dynamis
Stryks wrote:I mean, fair call, pulling x columns from varying amounts of rows is going to tell you query time to pull that data from all rows. But I wonder if performance is increased in cases where you are returning just a section of the dataset. Say, you just pull 1 column for 20 rows of each size set. Not with limit, but with criteria.
That is actually what I plan on testing next. Hopefully next week sometime. The results on that page will then be updated
Stryks wrote: Also, just wondering what type of tables you used, and if that would be a performance factor. I work with groups of data mainly, so I use InnoDB for the transactions.
I used InnoDB. Also another thing that would be worth testing. Thanks for the input.

Re: Will number of rows degrade performance?

Posted: Mon Aug 04, 2008 6:29 am
by jayshields
I haven't read this post in it's entirety, but any query executed on any size table acts upon every row in the table.

Re: Will number of rows degrade performance?

Posted: Mon Aug 04, 2008 1:39 pm
by califdon
jayshields wrote:I haven't read this post in it's entirety, but any query executed on any size table acts upon every row in the table.
While that's almost true, most queries are doing the selection on indexes, which should be faster (less bytes involved) than actually retrieving every row of the table itself. I have heard real experts say that the time is principally dependent on the number of rows returned.

Re: Will number of rows degrade performance?

Posted: Mon Aug 04, 2008 1:52 pm
by Christopher
Yes, the first step to improving query performance is to make sure that all columns used in WHERE, ON, GROUP BY, ORDER BY, etc. are indexed.

Re: Will number of rows degrade performance?

Posted: Mon Aug 04, 2008 2:11 pm
by ghurtado
And while we are on the topic of theoretical performance improvements... ;)

Keep in mind that not all indexes are created equal. The cardinality of the index will have the biggest impact on how useful the index itself is to improve seek performance. If you are indexing a table by, say, the "status" column, and this column can only have one of 4 values (say, "new", "active", "inactive" and "deleted"), the index can help you cut seek time roughly by 75% (assuming equal distribution). Other indexes (such as indexing by a wildly varying field, like a person's name) can be close to useless when the data is too heterogeneous.

Then there is the fact that your indexes are only useful so long as MySQL is using them. For this, it is a good idea to use "explain" so that you can get an insight into exactly how MySQL intends to go about fetching your data, and make sure that it is using the indexes you want it to use.

Re: Will number of rows degrade performance?

Posted: Mon Aug 04, 2008 3:32 pm
by Christopher
ghurtado wrote:Keep in mind that not all indexes are created equal. The cardinality of the index will have the biggest impact on how useful the index itself is to improve seek performance. If you are indexing a table by, say, the "status" column, and this column can only have one of 4 values (say, "new", "active", "inactive" and "deleted"), the index can help you cut seek time roughly by 75% (assuming equal distribution). Other indexes (such as indexing by a wildly varying field, like a person's name) can be close to useless when the data is too heterogeneous.
Actually having more heterogeneous data (or cardinality) helps with indexes. If your column only contains 'Y' and 'N' then you may have to search half the table to find your records. Likewise with your example, 1/4 of a million records is still 250k records to search. Obviously key columns use indexes to quickly get to the value even though they have all unique values.

Re: Will number of rows degrade performance?

Posted: Mon Aug 04, 2008 7:39 pm
by califdon
Here's a pretty informative article:
http://www.xaprb.com/blog/2006/07/04/ho ... mizations/

Re: Will number of rows degrade performance?

Posted: Tue Aug 05, 2008 10:18 am
by ghurtado
arborint wrote: Actually having more heterogeneous data (or cardinality) helps with indexes. If your column only contains 'Y' and 'N' then you may have to search half the table to find your records. Likewise with your example, 1/4 of a million records is still 250k records to search. Obviously key columns use indexes to quickly get to the value even though they have all unique values.
Maybe I didn't express myself right. What I meant to say was that a query like:

Code: Select all

 SELECT * from users where status_id = 2 
is more than likely going to be a lot faster than

Code: Select all

 SELECT * from users where name = "Zandor the Great" 
(Assuming of course, a sizable dataset, only a handful of status IDs, and every row having a different value for the "name" column). To put it simply, MySQL only has 4 (assuming 4 possible status IDs) "buckets" to start looking for your user in the first example, and possibly hundreds of thousands in the second one.

Granted, this is mostly based on non-scientific personal experience and my understanding of how indexes work, which is probably spotty at best.

In either case, it is nearly irrelevant since the cardinality of the values in a column are almost always outside of the developer's control.

Nevertheless, the biggest gain to be had from indexes is by using them on more or less homogeneus datasets. When the size of the index approaches the size of the dataset, the advantages taper off to almost no performance increase when using the index.

Re: Will number of rows degrade performance?

Posted: Tue Aug 05, 2008 11:06 am
by ghurtado
By the way, califdon, thank you for that excellent link, very technical and full of details; it has really brought light to how much I do *not* know about MySQL indexes... :)

Re: Will number of rows degrade performance?

Posted: Tue Aug 05, 2008 11:58 am
by Christopher
ghurtado wrote:(Assuming of course, a sizable dataset, only a handful of status IDs, and every row having a different value for the "name" column). To put it simply, MySQL only has 4 (assuming 4 possible status IDs) "buckets" to start looking for your user in the first example, and possibly hundreds of thousands in the second one.
The indexes are usually btrees (or similar), so the cardinality effects how quickly the engine can jump to the first possible record that needs to be checked. The more rows that have to be directly checked, the slower the query. That is why indexes speed up queries. Without an index, every record must be checked. I think generally speaking the performance is most effected by the time it takes to open the connection, the number of records that have to be checked, and the amount of results data that needs to be transferred to the client.