query optimization
Moderator: General Moderators
query optimization
I have a couple queries where im doing joins on a varchar field and have a where clause on a varchar field... the table has about 200k records and it's going quite slow.. I have an index on the varchar field also.
How would you recommend optimizing this? It's going really slow.
How would you recommend optimizing this? It's going really slow.
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
run an explain plan on your query(s)
That will show you what indexes are being used, how the tables are accessed etc
Code: Select all
EXPLAIN
SELECT field1, field2 FROM table1 WHERE field3 = 'whatever';
Yeah, I know... I think i'll restructure the data and tables...
As far as indexing varchars, I don't see anything wrong with that.. I'm using mod_rewrite so all url's are formatted /product_name/, i don't want to add an ID in there, so I have to get the data based on the product_name (url formatted product name)
As far as indexing varchars, I don't see anything wrong with that.. I'm using mod_rewrite so all url's are formatted /product_name/, i don't want to add an ID in there, so I have to get the data based on the product_name (url formatted product name)
That would only explain the use of the name in the where clause, that shouldn't have any bearing on the joins.GeXus wrote:As far as indexing varchars, I don't see anything wrong with that.. I'm using mod_rewrite so all url's are formatted /product_name/, i don't want to add an ID in there, so I have to get the data based on the product_name (url formatted product name)
The big issue behind using varchars as indexes, and then joining on them, is that normally when you do that, you are in essence going to cause a full stable scan, which means that every single solitary row in your table(s) are going to be queried.
Not to mention, you normally join on indexed fields that are 9 times out of 10 a primary key in another table. Joining on a varchar, as I said, will cause a longer join because it needs to validate text to be equal to text (thus, a much longer process on the processor than comparing tiny binary data such as integers).
again, i'd redesign the tables (= sounds like the relation-based layout you are going for it almost there, it's jst you are trying to link it all together with the wrong fields.
a very good article for you to read is : http://datamodel.org/NormalizationRules.html
it covers the basic concepts of normalization rules which is a very good and strict method of designing nice table layouts...
Not to mention, you normally join on indexed fields that are 9 times out of 10 a primary key in another table. Joining on a varchar, as I said, will cause a longer join because it needs to validate text to be equal to text (thus, a much longer process on the processor than comparing tiny binary data such as integers).
again, i'd redesign the tables (= sounds like the relation-based layout you are going for it almost there, it's jst you are trying to link it all together with the wrong fields.
a very good article for you to read is : http://datamodel.org/NormalizationRules.html
it covers the basic concepts of normalization rules which is a very good and strict method of designing nice table layouts...
Last edited by infolock on Mon Mar 26, 2007 10:24 am, edited 2 times in total.
Not true, that depends on many factors. If the varchar field (or combination of varchar fields) is a primary key it is going to be unique, the DBMS will use an index on this as it would on an Integer field. The only difference is an index lookup on an integer field is a bit quicker.infolock wrote:The big issue behind using varchars as indexes, and then joining on them, is that normally when you do that, you are in essence going to cause a full stable scan, which means that every single solitary row in your table(s) are going to be queried.
.
It is poor database design to stick in some arbitrary integer to represent a primary key when a natural primary key exists within your data.
Not true, that depends on many factors. If the varchar field (or combination of varchar fields) is a primary key it is going to be unique, the DBMS will use an index on this as it would on an Integer field. The only difference is an index lookup on an integer field is a bit quicker.
I think you just said what I was saying (= I didn't say varchar's are never used, I only said they are rarely used (9 times out of 10 does not equal 100%
While it is possible to index these, it's not always good to. If I have a table that has a list of the state of the United States of Amercia, I would definately index the varchar state field as there are not that many rows to index and the index size will not be too much. But when you start to abuse the ability to index a varchar, you are in essence going to destroy your query speed and see a major query result time when you have thousands of records or more.
Now, i have no idea where this came from. If I create a primary key, that primary key is most normally either a UNIQUE key, or it is an autoincremented key. Either way, no one is saying to drop the primary key and replace it with some arbitrary integer. That would definately be absurd and break every rule known to man about proper table design. However, if you have a field in your table called "last name", and table called "user_info", and you are using the last_name as the indexed lookup key for both tables, it would absolutely be stupid to do so when you could have a primary integer-based autoincrementing field in a table to replace last_name with user_id and index/join off that.It is poor database design to stick in some arbitrary integer to represent a primary key when a natural primary key exists within your data.
Unlikely that "last name" would be unique, so yes it would be absolutley stupid to retrieve the "user_info" with this method.infolock wrote:However, if you have a field in your table called "last name", and table called "user_info", and you are using the last_name as the indexed lookup key for both tables, it would absolutely be stupid to do so when you could have a primary integer-based autoincrementing field in a table to replace last_name with user_id and index/join off that.
But lets say we have a user table where their user name is a 10 character unique varchar value
Table user
user_name (unique, varchar, primary key)
first_name
last_name
Table related_user_info
user_name (foreign key)
some_other
info_about
People reading your other post might think "ah using varchars in an index and for joining is bad, must create integer field to join on".
Table user
user_id (unique, integer, primary key)
user_name (unique, varchar)
first_name
last_name
Table related_user_info
user_id
some_other
info_about
So they end up creating integer fields all over the place when a perfectly good "natural" primary key exists.
I dont want to give the wrong impression to novices when it comes to designing databases. The first thing anyone should do when writing queries is run an explain plan against them to see exactly what the query is doing. Rather than having a guess at what it is doing.