Page 1 of 2

query optimization

Posted: Sun Mar 25, 2007 4:24 pm
by GeXus
I have a couple queries where im doing joins on a varchar field and have a where clause on a varchar field... the table has about 200k records and it's going quite slow.. I have an index on the varchar field also.

How would you recommend optimizing this? It's going really slow.

Posted: Sun Mar 25, 2007 5:00 pm
by GeXus
Just an update.. I've delete some empty data and rebuilt indexes... query went from 11 seconds to 2.6 seconds. Not sure if there is anything else I can do aside from the query itself.

Posted: Sun Mar 25, 2007 5:03 pm
by Christopher
You would probably need to post the SQL to get more specific suggestions.

Posted: Mon Mar 26, 2007 1:17 am
by mikeq
run an explain plan on your query(s)

Code: Select all

EXPLAIN
SELECT field1, field2 FROM table1 WHERE field3 = 'whatever';
That will show you what indexes are being used, how the tables are accessed etc

Posted: Mon Mar 26, 2007 1:26 am
by infolock
Also, it sounds like to me that you have a layout issue more than anything else. Sounds like your indexing system is not set up correctly (ie, you have varchar's indexed when you should have tinyints, or you have integer fields when you should be using smallint, etc).

Posted: Mon Mar 26, 2007 4:09 am
by onion2k
I can't think of a situation in the past where I've ever joined something on varchar data before. That's very unusual.

Posted: Mon Mar 26, 2007 8:04 am
by GeXus
Yeah, I know... I think i'll restructure the data and tables...

As far as indexing varchars, I don't see anything wrong with that.. I'm using mod_rewrite so all url's are formatted /product_name/, i don't want to add an ID in there, so I have to get the data based on the product_name (url formatted product name)

Posted: Mon Mar 26, 2007 8:13 am
by onion2k
GeXus wrote:As far as indexing varchars, I don't see anything wrong with that.. I'm using mod_rewrite so all url's are formatted /product_name/, i don't want to add an ID in there, so I have to get the data based on the product_name (url formatted product name)
That would only explain the use of the name in the where clause, that shouldn't have any bearing on the joins.

Posted: Mon Mar 26, 2007 8:29 am
by infolock
The big issue behind using varchars as indexes, and then joining on them, is that normally when you do that, you are in essence going to cause a full stable scan, which means that every single solitary row in your table(s) are going to be queried.

Not to mention, you normally join on indexed fields that are 9 times out of 10 a primary key in another table. Joining on a varchar, as I said, will cause a longer join because it needs to validate text to be equal to text (thus, a much longer process on the processor than comparing tiny binary data such as integers).

again, i'd redesign the tables (= sounds like the relation-based layout you are going for it almost there, it's jst you are trying to link it all together with the wrong fields.

a very good article for you to read is : http://datamodel.org/NormalizationRules.html

it covers the basic concepts of normalization rules which is a very good and strict method of designing nice table layouts...

Posted: Mon Mar 26, 2007 8:36 am
by GeXus
You guys were right on the money... I added the id's to the table im joining on.. new query took 0.0087s vs. 4.8000s

Thank you!

Posted: Mon Mar 26, 2007 4:28 pm
by mikeq
infolock wrote:The big issue behind using varchars as indexes, and then joining on them, is that normally when you do that, you are in essence going to cause a full stable scan, which means that every single solitary row in your table(s) are going to be queried.

.
Not true, that depends on many factors. If the varchar field (or combination of varchar fields) is a primary key it is going to be unique, the DBMS will use an index on this as it would on an Integer field. The only difference is an index lookup on an integer field is a bit quicker.

It is poor database design to stick in some arbitrary integer to represent a primary key when a natural primary key exists within your data.

Posted: Mon Mar 26, 2007 4:43 pm
by infolock
Not true, that depends on many factors. If the varchar field (or combination of varchar fields) is a primary key it is going to be unique, the DBMS will use an index on this as it would on an Integer field. The only difference is an index lookup on an integer field is a bit quicker.

I think you just said what I was saying (= I didn't say varchar's are never used, I only said they are rarely used (9 times out of 10 does not equal 100% ;) ). There are absolutely times to use varchars and times not to, while the times not to significantly outweigh the times to use them. And this is largely due to the SIZE of the indexes that are created by using a varchar field that exceeds a 30 character limit (say someone choosing to use a 50 varchar field size or worse yet a 255 size or even worse still a TEXT field for indexing).

While it is possible to index these, it's not always good to. If I have a table that has a list of the state of the United States of Amercia, I would definately index the varchar state field as there are not that many rows to index and the index size will not be too much. But when you start to abuse the ability to index a varchar, you are in essence going to destroy your query speed and see a major query result time when you have thousands of records or more.


It is poor database design to stick in some arbitrary integer to represent a primary key when a natural primary key exists within your data.
Now, i have no idea where this came from. If I create a primary key, that primary key is most normally either a UNIQUE key, or it is an autoincremented key. Either way, no one is saying to drop the primary key and replace it with some arbitrary integer. That would definately be absurd and break every rule known to man about proper table design. However, if you have a field in your table called "last name", and table called "user_info", and you are using the last_name as the indexed lookup key for both tables, it would absolutely be stupid to do so when you could have a primary integer-based autoincrementing field in a table to replace last_name with user_id and index/join off that.

Posted: Tue Mar 27, 2007 2:36 am
by mikeq
infolock wrote:However, if you have a field in your table called "last name", and table called "user_info", and you are using the last_name as the indexed lookup key for both tables, it would absolutely be stupid to do so when you could have a primary integer-based autoincrementing field in a table to replace last_name with user_id and index/join off that.
Unlikely that "last name" would be unique, so yes it would be absolutley stupid to retrieve the "user_info" with this method.

But lets say we have a user table where their user name is a 10 character unique varchar value

Table user
user_name (unique, varchar, primary key)
first_name
last_name

Table related_user_info
user_name (foreign key)
some_other
info_about

People reading your other post might think "ah using varchars in an index and for joining is bad, must create integer field to join on".

Table user
user_id (unique, integer, primary key)
user_name (unique, varchar)
first_name
last_name

Table related_user_info
user_id
some_other
info_about

So they end up creating integer fields all over the place when a perfectly good "natural" primary key exists.

I dont want to give the wrong impression to novices when it comes to designing databases. The first thing anyone should do when writing queries is run an explain plan against them to see exactly what the query is doing. Rather than having a guess at what it is doing.

Posted: Tue Mar 27, 2007 3:34 am
by onion2k
mikeq wrote:But lets say we have a user table where their user name is a 10 character unique varchar value
If it's always going to be 10 characters then you'd use a char rather than a varchar. But what you say holds true if it's variable length.

Posted: Tue Mar 27, 2007 4:23 am
by mikeq
sorry, should have said a max of 10 characters.