Page 1 of 1

Query takes longer and longer...

Posted: Thu Aug 24, 2006 10:48 am
by Todd_Z
For some reason the following query is fast when t_relation is empty, then when t_relation starts to get populated, it gets slower and slower and slower. For example, when the size of the database is 2500, the query will take 3 minutes. When the size is 5000, it takes over 7 minutes. Is there a way to optimize this?

The reason for the limit 2500 is that I want each iteration of the script to work very quickly, independent of the size of the database. When I first run this script to calculate 5000 address against 120 t-stops [t=Trolley], I want it to take a couple iterations.

The script is on a crontab running every 5 minutes, most times there will be nothing to do, but if an address is added, then obviously, it would calculate it.

Code: Select all

SELECT address.id AS aid, t_station.id AS tid FROM address, t_station WHERE address.status = 'Active' AND 
t_station.status = 'Active' AND ( SELECT COUNT(*) FROM t_relation WHERE t_relation.status = 'Active' AND address_id = address.id AND t_station_id = t_station.id AND MD5(CONCAT_WS('-',address.latitude,address.longitude)) = md5_address AND MD5(CONCAT_WS('-',t_station.latitude,t_station.longitude)) = md5_station ) = 0 LIMIT 2500

Code: Select all

CREATE TABLE `t_relation` (
  `t_station_id` int(2) unsigned default NULL,
  `md5_station` varchar(32) default NULL,
  `address_id` int(36) default NULL,
  `md5_address` varchar(32) default NULL,
  `miles` double default NULL,
  `timestamp` datetime default NULL,
  `status` enum('Active','Disabled') NOT NULL,
  `id` int(16) unsigned NOT NULL auto_increment,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Code: Select all

CREATE TABLE `address` (
  `street_number` varchar(8) default NULL,
  `street_name` varchar(64) default NULL,
  `street_type` varchar(8) default NULL,
  `city` varchar(32) default NULL,
  `state` char(2) default NULL,
  `zip` decimal(5,0) unsigned zerofill default NULL,
  `longitude` double default NULL,
  `latitude` double default NULL,
  `exact` enum('Yes','No') NOT NULL default 'Yes',
  `timestamp` datetime default NULL,
  `status` enum('Active','Disabled') NOT NULL,
  `id` int(36) unsigned NOT NULL auto_increment,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `street_number` (`street_number`,`street_name`,`street_type`,`city`,`state`,`zip`),
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Code: Select all

CREATE TABLE `t_station` (
  `title` varchar(32) default NULL,
  `line` varchar(8) default NULL,
  `longitude` double default NULL,
  `latitude` double default NULL,
  `timestamp` datetime default NULL,
  `status` enum('Active','Disabled') NOT NULL,
  `id` int(2) unsigned NOT NULL auto_increment,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `title` (`title`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Posted: Thu Aug 24, 2006 10:58 am
by feyd
Much better looking now. :) (Highlighted)

hmm, well subqueries aren't the fastest things I've noticed, nor is running large numbers of MD5().

Posted: Thu Aug 24, 2006 11:33 am
by RobertGonzalez
Are the queries slow, or is it the PHP script (or are you weven usign a PHP script)?

Posted: Thu Aug 24, 2006 11:36 am
by feyd
Forgot to mention this earlier: Something to look into is an EXPLAIN on the query to see what columns are being used and such to see if you can add indexes or change up the query to be more specific by ruling out data earlier rather than later.

Posted: Thu Aug 24, 2006 11:59 am
by Todd_Z
There is a php calculation being done after I pull the data, but that's not the bottle neck, it's just the query.

I'm gonna do some research using EXPLAIN and report back.

In terms of replacing md5, got any suggestions? Chances are all the lats will be between 40 and 45, and longs will be -70 to -72. All I need to do is make sure that the values haven't changed, thats the only purpose of the md5.

Posted: Thu Aug 24, 2006 12:02 pm
by feyd
Todd_Z wrote:In terms of replacing md5, got any suggestions? Chances are all the lats will be between 40 and 45, and longs will be -70 to -72. All I need to do is make sure that the values haven't changed, thats the only purpose of the md5.
Straight string compare? You're mashing the fields together as is, so maybe a straight string compare would be faster/easier for MySQL to perform.

Posted: Thu Aug 24, 2006 12:23 pm
by Todd_Z
Fair enough.

Posted: Thu Aug 24, 2006 12:27 pm
by Todd_Z
The result of my explain on the select query:

Code: Select all

+----+--------------------+-----------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type        | table     | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+--------------------+-----------+------+---------------+------+---------+------+------+-----------------------------------------------------+
|  1 | PRIMARY            | t_station | ALL  | NULL          | NULL | NULL    | NULL |  112 | Using where |
|  1 | PRIMARY            | address   | ALL  | NULL          | NULL | NULL    | NULL | 5734 | Using where |
|  2 | DEPENDENT SUBQUERY | NULL      | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
+----+--------------------+-----------+------+---------------+------+---------+------+------+-----------------------------------------------------+

Posted: Thu Aug 24, 2006 11:47 pm
by RobertGonzalez
IMPOSSIBLE does not sound good.

Posted: Fri Aug 25, 2006 12:13 am
by AKA Panama Jack
Also, you haven't indexed any of the fields you use in your query. Indexing will speed up your query quite a bit.

As it stands now you are doing MULTIPLE full table scans because of the lack of indexing. This is why it takes so long and gets longer as more records are added.

And you might want to change your query to something like this...

Code: Select all

SELECT address.id AS aid, t_station.id AS tid FROM address, t_station WHERE address.status = 'Active' AND 
t_station.status = 'Active' AND ( SELECT COUNT(*) FROM t_relation WHERE t_relation.status = 'Active' AND t_relation.address_id = address.id AND t_relation.t_station_id = t_station.id AND MD5(CONCAT_WS('-',address.latitude,address.longitude)) = md5_address AND MD5(CONCAT_WS('-',t_station.latitude,t_station.longitude)) = t_relation.md5_station ) = 0 LIMIT 2500