I've got a reasonably complex SQL query I'm looking to build.
I have 2 related tables (for this question)... companies and tags_data.
tags_data stores the information about which companies are tagged with which tags.
The companies table is related to tags_data by a company_id field. Tags_data has 3 fields in total: tagdata_id (the auto-enter primary key), company_id and tag_id.
I'm looking to get a list of all companies which have the same 3 tags. But I also want the ability to use it for a variable number of tags, so to list companies that have the same 2 tags or the same 10 tags. That's why I have initially set it up using INNER JOIN to compute the company_ids that have the required tags, then return those companies from the companies table.
Here's what I've got so far to list companies that have all 3 of the following tag ids... 713, 718, 724
Code: Select all
SELECT `companies`.`company_id`, `companies`.`company_name`
FROM `companies`
INNER JOIN (
SELECT `company_id`, COUNT(`tagdata_id`) FROM `tags_data` WHERE `tag_id` IN(713, 718, 724) GROUP BY `company_id` HAVING COUNT(`tagdata_id`)=3
) AS `require_tags` ON `companies`.`company_id`=`require_tags`.`company_id`
GROUP BY `companies`.`company_id`
ORDER BY `companies`.`company_name`Performance of this is adequate, taking about 0.1 seconds to look through 150k tags_data records and 40k companies records to return 99 companies matching the query. The tables have indexes on company_id and tag_id fields.
The great thing about the INNER JOIN on a derived table is that I can easily change it to list companies that don't have any of those tags, or companies that have more than x and less than y number of multiple tags by just modifying the HAVING clause. I'm planning to make a query builder so users can build these complex queries for themselves.
Performance dives when I add a WHERE clause to the parent query (eg: outside the INNER JOIN) to do an even more complex search but I guess the answer to that is to move the WHERE clause inside the INNER JOIN.
Is there a method that would be faster but still give me the same flexibility as using an INNER JOIN in this way?
Or should the INNER JOIN theoretically be the fastest method and any further speed improvements would come from improved indexes and hardware?
Cheers, B