Page 1 of 2

Optimizing a query

Posted: Fri Apr 20, 2012 9:48 am
by shiznatix
[EDIT] I just realized I posted this in the wrong form. Beautiful moderators, will you please move it for me?

Hi everyone,

So, I have this website that searches through a database of companies. The database structure is as follows:

[text]
companies
-companyid
-name
-member
-active
keywords
-keywordid
-word
company_keywords
-companyid
-keywordid
company_addresses
-companyAddressid
-companyid
-cityid
-address
cities
-cityid
-countyid
-name
counties
-countyid
-name
[/text]

Now, companies, company_addresses, keywords, and company_keywords have about 800,000 rows each. My search query uses 2 variables, $name and $location. The query looks like this:

Code: Select all

SELECT
	companies.companyid,
	companies.name,
	companies.phoneNumber
FROM
	companies AS companies
LEFT JOIN
	company_keywords AS companyKeywords
ON
	companies.companyid = companyKeywords.companyid
LEFT JOIN
	keywords AS keywords
ON
	keywords.keywordid = companyKeywordsid.keywordid
LEFT JOIN
	company_addresses AS companyAddresses
ON
	companies.companyid = companyAddresses.companyid
LEFT JOIN
	cities AS cities
ON
	companyAddresses.cityid = cities.cityid
LEFT JOIN
	counties AS counties
ON
	cities.countyid = counties.countyid
WHERE
	(
		companies.name LIKE '%$name%'
		OR
		keywords.word LIKE '%$name%'
		OR
		keywords.word LIKE '%$location%'
		OR
		companyAddresses.address LIKE '%$location%'
		OR
		cities.name = '$location'
		OR
		counties.name = '$location'
	)
AND
	(
		companies.active = '1'
	)
ORDER BY
	companies.member DESC, companies.name ASC
LIMIT 10
now, while the query works just fine, it takes quite a long time. After various tests, it averages slightly over 5 seconds to run on localhost. This would be fine but this is a user facing form so it really has to be quicker than that. Plus, I first have to run a COUNT() query for pagination things so that puts even more time into the situation.

Is there a way to optimize this query? I have tried adding indexes everywhere I could but this only got it down to the current 5 seconds-per-query speed. Any advice is welcome and changes to the database structure are a-ok if it will help with preformance.

Re: Optimizing a query

Posted: Fri Apr 20, 2012 9:57 am
by Celauran
Have you used EXPLAIN to determine where the bottlenecks occur?

Re: Optimizing a query

Posted: Fri Apr 20, 2012 9:59 am
by x_mutatis_mutandis_x
Try inner joins instead of left joins. Which columns did you index? Do you have your foreign keys established on your tables as well?

Re: Optimizing a query

Posted: Sat Apr 21, 2012 2:57 am
by shiznatix
Thanks for the advice. I will try these things out on Monday when I get back to work and will post back the results.

Re: Optimizing a query

Posted: Mon Apr 23, 2012 2:01 am
by shiznatix
Ok, so I tried inner joins instead of left joins and it took the query down by 1 second. Which is good but 4 seconds is still too long.

My query right now is:

Code: Select all

SELECT DISTINCT COUNT( companies.companyid ) AS  `resultsCount` 
FROM  `bs_companies` AS  `companies` 
INNER JOIN  `bs_company_keywords` AS  `companyKeywords` ON companies.companyid = companyKeywords.companyid
INNER JOIN  `bs_keywords` AS  `keywords` ON companyKeywords.keywordid = keywords.keywordid
INNER JOIN  `bs_company_addresses` AS  `companyAddresses` ON companies.companyid = companyAddresses.companyid
INNER JOIN  `bs_cities` AS  `cities` ON companyAddresses.cityid = cities.cityid
INNER JOIN  `bs_counties` AS  `counties` ON cities.countyid = counties.countyid
WHERE (
companies.name LIKE  '%taxi%'
OR keywords.word =  'taxi'
OR companyAddresses.address LIKE  '%taxi%'
OR cities.name =  'taxi'
OR counties.name =  'taxi'
)
AND (
companies.active =  '1'
)
ORDER BY  `companies`.`member` DESC ,  `companies`.`name` ASC 
My keys are:
[text]
PRIMARY on bs_cities.cityid
INDEX on bs_cities.name

PRIMARY on bs_companies.companyid
INDEX on bs_companies.name

INDEX on bs_company_addresses.companyid
INDEX on bs_company_addresses.cityid
INDEX on bs_company_addresses.address

INDEX on bs_company_keywords.companyid
INDEX on bs_company_keywords.keywordid

PRIMARY on bs_counties.countyid
INDEX on bs_counties.name

PRIMARY on bs_keywords.keywordid
INDEX on bs_keywords.word
[/text]

And when I use EXPLAIN I get this result:
[text]
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE keywords ALL PRIMARY,word NULL NULL NULL 533 Using temporary
1 SIMPLE companyKeywords ref companyid,keywordid keywordid 3 bolagskollen.keywords.keywordid 1519 Using where
1 SIMPLE companies eq_ref PRIMARY PRIMARY 3 bolagskollen.companyKeywords.companyid 1 Using where
1 SIMPLE companyAddresses ref companyid,cityid companyid 3 bolagskollen.companies.companyid 1 Using where
1 SIMPLE cities eq_ref PRIMARY,name PRIMARY 2 bolagskollen.companyAddresses.cityid 1 Using where
1 SIMPLE counties eq_ref PRIMARY,name PRIMARY 1 bolagskollen.cities.countyid 1 Using where
[/text]

Any ideas for how to get this query down to like 1 second?

Re: Optimizing a query

Posted: Mon Apr 23, 2012 2:33 am
by Weirdan
uhm, your original query hardly matches the second query you posted. Assuming your second query is what you actually use I don't see why it wouldn't use the index over keywords.word, unless there's less than 3 keywords (but there's more as it scans 533 rows).

As an unrelated note, you don't have to use a separate count query to get total number of items - instead you could use SQL_CALC_FOUND_ROWS modifier + 'select found_rows()'

You could try alternative approach based on UNION DISTINCT, like this:

Code: Select all

(select company.name from company inner join companykeyword ck on ck.company_id=company.id inner join keyword on keyword.id=ck.keyword_id where word="$name")
UNION DISTINCT
(select company.name from company inner join companyaddress ca on ca.company_id=company.id inner join city on city.id=ca.city_id where city.name="$location")
UNION DISTINCT
(select company.name from company inner join companyaddress ca on ca.company_id=company.id inner join county on county.id=ca.county_id where county.name="$location")
Ideally you'd want to switch to prefix-matching the search term, because predicates like 'LIKE "%something%"' cannot use indices, while prefix-matching predicates like 'LIKE "something%"' can.

Re: Optimizing a query

Posted: Mon Apr 23, 2012 3:15 am
by shiznatix
Yes, the first query i posted was just what I could remember after I had gotten home from work. The one I posted today is the one I am using currently but it is changing slightly very often to try to get the time down.

Weirdan: I tried your query but it basically killed my mysql server and I had to shut it down and restart it to bring it back to responsiveness.

I changed the WHERE to this:

Code: Select all

WHERE (
companies.name LIKE  'taxi%'
OR keywords.word =  'taxi'
OR companyAddresses.address LIKE  'stockholm%'
OR cities.name =  'stockholm'
OR counties.name =  'stockholm'
OR keywords.word = 'stockholm'
)
AND (
companies.active =  '1'
)
and I also did some tests with this. Basically, I took everything in the first section (the OR parts) out and only left companies.name LIKE 'taxi%' and this dropped the query time down to like 0.005 seconds. But, with every additional OR statement, the speed dropped and with everything being used, it takes 5 seconds.

I have also switched the joins to LEFT JOINs because a company doesn't also have an address, keywords, etc so as far as I understand the companies without rows in the other tables wont be joined in if I use INNER JOIN.

My current full query is:

Code: Select all

SELECT DISTINCT COUNT( companies.companyid ) AS  `resultsCount` 
FROM  `bs_companies` AS  `companies` 
LEFT JOIN  `bs_company_keywords` AS  `companyKeywords` ON companies.companyid = companyKeywords.companyid
LEFT JOIN  `bs_keywords` AS  `keywords` ON companyKeywords.keywordid = keywords.keywordid
LEFT JOIN  `bs_company_addresses` AS  `companyAddresses` ON companies.companyid = companyAddresses.companyid
LEFT JOIN  `bs_cities` AS  `cities` ON companyAddresses.cityid = cities.cityid
LEFT JOIN  `bs_counties` AS  `counties` ON cities.countyid = counties.countyid
WHERE (
companies.name LIKE  '$name%'
OR keywords.word =  '$name'
OR companyAddresses.address LIKE  '$location%'
OR cities.name =  '$location'
OR counties.name =  '$location'
OR keywords.word = '$location'
)
AND (
companies.active =  '1'
)
ORDER BY  `companies`.`member` DESC ,  `companies`.`name` ASC 
I also tried using full text on the companies.name and companyAddresses.address but this didn't seam to do anything for the speed

Re: Optimizing a query

Posted: Mon Apr 23, 2012 3:49 am
by Weirdan
shiznatix wrote: Weirdan: I tried your query but it basically killed my mysql server and I had to shut it down and restart it to bring it back to responsiveness.
That's interesting. What did EXPLAIN look like?

Re: Optimizing a query

Posted: Mon Apr 23, 2012 4:19 am
by shiznatix
Ok, I tried it again and I think what killed everything was the fact that this was returning something like 10,000 rows into phpmyadmin. I added a limit 10 and the query is quite fast now.

Some things though. How do I get it to return only 10 rows? If I add the LIMIT 10 to every select query, it still gives me 20 rows or so. Also, how do I get the results for all of the queries and then run an ORDER BY on it? Something like a global order clause?

Edit: Also, how could I do pagination with this without returning the entire result set (LIMIT $amout, $page)

My current query is:

Code: Select all

(
	SELECT
		companies.name
	FROM
		bs_companies AS companies
	WHERE
		companies.name LIKE "$name%"
	LIMIT 10
)
UNION DISTINCT
(
	SELECT
		companies.name
	FROM
		bs_companies AS companies
	INNER JOIN
		bs_company_keywords AS companyKeywords
	ON
		companyKeywords.companyid = companies.companyid
	INNER JOIN
		bs_keywords AS keywords
	ON
		keywords.keywordid = companyKeywords.keywordid
	WHERE
		keywords.word = "$name"
	LIMIT 10
)
UNION DISTINCT
	(
		SELECT
			companies.name
		FROM
			bs_companies AS companies
		INNER JOIN
			bs_company_addresses AS companyAddresses
		ON
			companyAddresses.companyid = companies.companyid
		INNER JOIN
			bs_cities AS cities
		ON
			cities.cityid = companyAddresses.cityid
		INNER JOIN
			bs_counties AS counties
		ON
			counties.countyid = cities.cityid
		WHERE
			cities.name = "$location"
		OR
			counties.name = "$location"
		LIMIT 10
	)
UNION DISTINCT
	(
		SELECT
			companies.name
		FROM
			bs_companies AS companies
		INNER JOIN
			bs_company_addresses AS companyAddresses
		ON
			companyAddresses.companyid = companies.companyid
		WHERE
			companyAddresses.address = "$location"
		LIMIT 10
	)
Unrelated: how do I use the syntax BB code tag?

Re: Optimizing a query

Posted: Mon Apr 23, 2012 5:59 am
by Weirdan
shiznatix wrote:Also, how do I get the results for all of the queries and then run an ORDER BY on it? Something like a global order clause?

Edit: Also, how could I do pagination with this without returning the entire result set (LIMIT $amout, $page)
MySQL manual wrote: To use an ORDER BY or LIMIT clause to sort or limit the entire UNION result, parenthesize the individual SELECT statements and place the ORDER BY or LIMIT after the last one. The following example uses both clauses:

(SELECT a FROM t1 WHERE a=10 AND B=1)
UNION
(SELECT a FROM t2 WHERE a=11 AND B=2)
ORDER BY a LIMIT 10;
Unrelated: how do I use the syntax BB code tag?
[text]

Code: Select all

...

Code: Select all

...
[/text]

Re: Optimizing a query

Posted: Mon Apr 23, 2012 10:24 am
by mikosiko
using your last posted query:

Code: Select all

SELECT DISTINCT COUNT( companies.companyid ) AS  `resultsCount` 
FROM  `bs_companies` AS  `companies` 
LEFT JOIN  `bs_company_keywords` AS  `companyKeywords` ON companies.companyid = companyKeywords.companyid
LEFT JOIN  `bs_keywords` AS  `keywords` ON companyKeywords.keywordid = keywords.keywordid
LEFT JOIN  `bs_company_addresses` AS  `companyAddresses` ON companies.companyid = companyAddresses.companyid
LEFT JOIN  `bs_cities` AS  `cities` ON companyAddresses.cityid = cities.cityid
LEFT JOIN  `bs_counties` AS  `counties` ON cities.countyid = counties.countyid
WHERE (
companies.name LIKE  '$name%'
OR keywords.word =  '$name'
OR companyAddresses.address LIKE  '$location%'
OR cities.name =  '$location'
OR counties.name =  '$location'
OR keywords.word = '$location'
)
AND (
companies.active =  '1'
)
ORDER BY  `companies`.`member` DESC ,  `companies`.`name` ASC
did you test it taking out just the ORDER BY?... post back the resulting explain plan

Re: Optimizing a query

Posted: Mon Apr 23, 2012 11:06 am
by shiznatix
mikosiko: That query is taking 5+ seconds like my original ones were. Also, I did try taking out the ORDER BY but it didn't change the speed at all.

It seams that Weirdan has the right idea but I am getting some strange results. I have this query:

Code: Select all

(SELECT `companies`.`companyid`, `companies`.`childCategoryid`, `companies`.`registrationNumber`, `companies`.`name`, `companies`.`phoneNumber`, `companies`.`member`, `childCategories`.`name` AS `childCategory` FROM `bs_companies` AS `companies` INNER JOIN `bs_child_categories` AS `childCategories` ON childCategories.childCategoryid = companies.childCategoryid WHERE (companies.name LIKE '%Ättestupans Café%') AND (companies.active = "1"))
UNION
(SELECT `companies`.`companyid`, `companies`.`childCategoryid`, `companies`.`registrationNumber`, `companies`.`name`, `companies`.`phoneNumber`, `companies`.`member`, `childCategories`.`name` AS `childCategory` FROM `bs_companies` AS `companies` INNER JOIN `bs_child_categories` AS `childCategories` ON childCategories.childCategoryid = companies.childCategoryid INNER JOIN `bs_company_keywords` AS `companyKeywords` ON companyKeywords.companyid = companies.companyid INNER JOIN `bs_keywords` AS `keywords` ON keywords.keywordid = companyKeywords.keywordid WHERE (keywords.word LIKE '%Ättestupans Café%') AND (companies.active = "1")) ORDER BY `member` DESC, `name` ASC LIMIT 10 OFFSET 20
Now, if I run just the first select query there, I get the proper result (there is a company name that is exactly equal to that and this is returned just fine). But, if I use the whole posted query (with the union), it doesn't return anything. Just gives me back an empty result set.

Maybe I don't fully understand the UNION statement but this seams odd - shouldn't it take the result from the first query and add in the 0 results from the second query?

Re: Optimizing a query

Posted: Mon Apr 23, 2012 11:14 am
by mikosiko
from the manual:
The default behavior for UNION is that duplicate rows are removed from the result. The optional DISTINCT keyword has no effect other than the default because it also specifies duplicate-row removal. With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching rows from all the SELECT statements.

Re: Optimizing a query

Posted: Mon Apr 23, 2012 3:57 pm
by shiznatix
So if I understand correctly, doing a regular UNION means any row found in all of the selects will NOT be in the result set.

If I use UNION ALL, it a is found in all of the selects then it will be included in the result set twice or only once?

Re: Optimizing a query

Posted: Mon Apr 23, 2012 4:20 pm
by mikosiko
shiznatix wrote:So if I understand correctly, doing a regular UNION means any row found in all of the selects will NOT be in the result set.
The default behavior for UNION is that duplicate rows are removed from the result.
means that only the duplicate rows are going to be removed, or in other words... if a row is found twice it will be included only once in the result set

shiznatix wrote:If I use UNION ALL, it a is found in all of the selects then it will be included in the result set twice or only once?
With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching row
it will be included as many times as it is found