large query going very slow

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

large query going very slow

Post by John Cartwright »

In a nutshell I'm developing some statistics to track our users, based upon their clicks and impressions. Now, this table has gotten extremely large, and is growing very rapidly. It is approaching 30 million rows, and the query has become excessively slow

Code: Select all

SELECT COUNT(`click_views`.`id`) AS `clickviews`, `view_type`, `sites`.`id`, `site_name`
FROM `click_views` 
INNER JOIN `sites` ON `click_views`.`site_id` = `sites`.`id`
WHERE `click_views`.`af_id` > 0
AND `click_views`.`view_type` != ''
GROUP BY `sites`.`id`, `view_type`
Now working on my local server with a reduced amount of data, approximatly 5 million rows, the query takes about 10 seconds. With the 30 million rows, it takes about 300 seconds.

Here is what EXPLAIN of the SQL shows

Code: Select all

id 	select_type 	table 	      type   possible_keys 	         key 	 key_len 	ref 	          rows 	Extra
1 	 SIMPLE 	     sites 	      ALL 	PRIMARY 	               NULL 	NULL 	   NULL 	         10 	  Using temporary; Using filesort
1 	 SIMPLE 	     click_views    ref 	site_id,view_type,af_id   site_id 1 	      spiders.sites.id 316368  Using where
and here is my table structure of the click_views table

Code: Select all

CREATE TABLE `click_views` (
  `id` int(11) NOT NULL auto_increment,
  `ip` varchar(16) collate utf8_unicode_ci default NULL,
  `af_id` int(11) NOT NULL default '0',
  `campaign_id` int(11) NOT NULL default '0',
  `url_id` int(11) default NULL,
  `date` datetime NOT NULL default '0000-00-00 00:00:00',
  `banner_id` mediumint(9) NOT NULL default '0',
  `planID` mediumint(9) NOT NULL default '0',
  `view_type` enum('v','c') collate utf8_unicode_ci NOT NULL default 'v',
  `site_id` tinyint(4) NOT NULL default '0',
  `redirect_id` int(11) NOT NULL default '0',
  `sub` varchar(55) collate utf8_unicode_ci NOT NULL default 'N/A',
  `galleryname` varchar(24) collate utf8_unicode_ci NOT NULL default '',
  PRIMARY KEY  (`id`),
  KEY `redirect_id` (`redirect_id`),
  KEY `ip` (`ip`),
  KEY `site_id` (`site_id`),
  KEY `campaign_id` (`campaign_id`),
  KEY `view_type` (`view_type`),
  KEY `date` (`date`),
  KEY `af_id` (`af_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=2214611 ;
and heres the sites table

Code: Select all

CREATE TABLE `sites` (
  `id` int(5) NOT NULL auto_increment,
  `site_name` varchar(15) collate utf8_unicode_ci NOT NULL default '',
  `commission` varchar(255) collate utf8_unicode_ci NOT NULL default '',
  `active` tinyint(1) default '1',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=11 ;
Now I am no database guru, although for the love of me cannot figure out a way to get this query to run fast. Am I being unrealistic about wanting to COUNT() 30 million rows fast?
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

I'm not 100% sure, but creating an index on `click_views`.`site_id` should help as that's what it's joining on in the second part of the explain. Possibly a 2 column index on `click_views`.`id`and `click_views`.`site_id`?
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

They have already been set to keys.
:(
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

When did you last run "OPTIMIZE TABLE `click_views`"? If the answer is 'never', run it.
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Post by pickle »

300 seconds = 5 minutes. For 30 million rows that doesn't sound all that long. I don't know how old this information is, but I've heard that MySQL has trouble joining tables when at least one of them is > 1 million rows.

If you've got everything indexed, I'm not sure what else you can do. I've noticed that for some tables, it's actually faster to do the join logic in code rather than in MySQL. I can't explain why, but when I changed some of my queries, I saw a > 200% speed up.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

You are pretty much screwed even with indexing because your query is going to be doing table scans because of the GROUP BY coupled with the INNER JOIN you are using. And performing table scans of a 30 million record table is going to be very slow and CPU intensive.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

according to mysql manual:
The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (for example, this is a BTREE index and not a HASH index). Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.
your best bet would be to group by `click_views`.`id`, `view_type` (not the `sites`.`id`!) and add an index over these two columns. Using count(*) instead of count(`click_views`.`id`) may help too.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

thanks for the input guys, I'll let you know how it goes as soon as I get some positive results :)
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

What about changing the table type to InnoDB from MyISAM? Would that help?
User avatar
AKA Panama Jack
Forum Regular
Posts: 878
Joined: Mon Nov 14, 2005 4:21 pm

Post by AKA Panama Jack »

Everah wrote:What about changing the table type to InnoDB from MyISAM? Would that help?
Probably depends upon how much memory he has availible and if it is a dedicated server. Innodb can be faster but that is because it tries to load the entire table into memory. Myisam just keeps the indexes in memory. With 30 million records that may not be possible.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Just so you know, we are running a dedicated server.

I'm just finishing up the project for the time being as is without making any modifications to the database itself, as there are many other sites reliant on the data as well. Don't really want to risk any problems with the other sites.

If you can think of any more optimization tips I'm all ears, however it will be a couple days before I can let you guys know my findings.
User avatar
dreamscape
Forum Commoner
Posts: 87
Joined: Wed Jun 08, 2005 10:06 am
Contact:

Post by dreamscape »

If you think you are going to be able to efficiently analyze and mine data in an online realtime process, think again.

You will be much better off to generate report data in an offline process, and then use that data to generate your online reports.

If it is taking about 5 minutes to run, updating the report data every 1 hour or so would be doable.
Post Reply