Page 1 of 4

MySQL, get joined records separated by commas

Posted: Thu Nov 20, 2008 3:05 am
by batfastad
Hi everyone
I'm converting our database system from Filemaker to MySQL and things are going well.
However I've got quite a specific problem that I need to solve now.

There's 3 tables in my database:
companies - stores contact details of our suppliers. One record per company.
classifications - all the available classifications that a company can have. Manufacturer, Distributor, Event Organiser etc. Has 2 fields, classification_id and classification_name
classifications_data - stores which classifications are selected for which company. Only has 2 fields: company_id and classification_id and basically links the companies and classifications databases together

This is all fine and it works well. It allows multiple classifications to be stored for each company, and it's a the sort of standard normalised DB structure I've been looking to achieve for some years.
I hope I've explained it clearly.

But i'm looking to build a query which returns 2 columns: a company_name field from the companies table, and another column which is a comma-separated list of the classification names that each company has from classifications_data.
So something like this:

Code: Select all

ABC Widget Co          MANUFACTURER,DISTRIBUTOR
London Carrier Co         DISTRIBUTOR
WXY Gadgets Ltd          MANUFACTURER,EVENT ORGANISER
It's the comma separated list of the related classifications I'm having trouble with.
Is that possible directly within MySQL?

The alternative is to do a LEFT/RIGHT JOIN and loop through the related records in PHP, building the comma-separated list for that company then moving on to the next company. But that's a bit of a horrible old-school way of doing things.
Just wondering if there are any options to do this within the MySQL query?
Rather than manually restructuring the data with PHP.

Thanks, Ben

Re: MySQL, get joined records separated by commas

Posted: Thu Nov 20, 2008 5:18 am
by VladSun

Re: MySQL, get joined records separated by commas

Posted: Thu Nov 20, 2008 9:01 am
by batfastad
Wow, ok that looks like it will work perfectly :D

The query I'm trying to do is actually a bit more complicated than I described above. I'm having trouble working out what to join and what to group. I thought I might just be able to work it out, but I think I'm missing some fundamental SQL knowledge.

Here's a quick diagram of my 4 linked tables:
Image

The field names in red show the fields that link the tables. The field names in green are some or all of the fields that are in that particular table, the bold being the primary key.

Here's the query I'm starting off with:

Code: Select all

SELECT `contacts`.`contact_name`, `contacts`.`contact_address`, `companies`.`company_name`, `companies`.`company_address`, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR '|')FROM ??????GROUP BY ???WHERE `contacts`.`subscribed`='1'ORDER BY `companies`.`company_country`, `companies`.`company_name`, `contacts`.`contact_name` ASCLIMIT 0, 10000
So I'm looking to return all contacts where contacts.subscribed=1, regardless of whether the parent company has any tags stored in the tagdata table.

I've tried various different joins but I'm getting confused about the order I in which I need to be making the joins.
I guess first off I need to have a simple LEFT JOIN between contacts and companies, but then I'm a bit confused about how I join the other 2 tables in there as well.
I've not even gone near thinking about the GROUP BY yet, I was planning on saving that until I've got the joins worked out - to avoid even more confusion.

I hope I've explained it ok, and I'd really appreciate some help on this.
Once I've got a working example of this 3-way join then it will really help me understand better how these work together. I've read through the syntax examples on here http://www.wellho.net/solutions/mysql-l ... ables.html but I can't work out how to apply it to my DB structure.

Thanks, Ben

Re: MySQL, get joined records separated by commas

Posted: Thu Nov 20, 2008 9:28 am
by VladSun
Well, I'm not going to write it for you - you are doing pretty well ;) So, you should be able to write it ;)
Just forget for a while for GROUPing and write down a SQL query which will list all the data you need (that is - all the fields from select area you already wrote) - you should get multiple rows per company/constact.
So ... let us see it ;)

Re: MySQL, get joined records separated by commas

Posted: Thu Nov 20, 2008 11:29 am
by batfastad
Ok I think I'm getting there.
This gets the correct company and contact data at least, I think. Will do a verification shortly.

Code: Select all

SELECT `contacts`.`contact_name`, `companies`.`company_name`, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR '|') AS `classifications`FROM `contacts`LEFT JOIN `companies` USING(`company_id`)LEFT JOIN `tags_data` ON `companies`.`company_id`=`tags_data`.`company_id`LEFT JOIN `tags` ON `tags_data`.`tag_id`=`tags`.`tag_id`WHERE `contacts`.`subscribed`='1'GROUP BY `contacts`.`contact_id`ORDER BY `companies`.`company_name`, `contacts`.`contact_name` ASCLIMIT 0, 10000
But the query takes about 80 seconds to return 777 records. Removing the ORDER BY clause only shaved off 1-3 seconds.
So I'm guessing this is down to the indexing of the tables, and the fact that it's loads of data to process!
From looking at this query, which fields should I make sure are indexed?
contact_id, company_id, tag_id, tag_name are all either primary keys or are indexed. Are there any other fields you can from this query that I should be indexing?

If I get rid of the GROUP_CONCAT and the GROUP BY clauses, the query still takes about 80 seconds. But it obviously returns far more records (>10000) without the GROUP BY.

Is there anything obvious I can do to speed this up?

Maybe running 1 query which sets a temporary field containing the classifications (see query above) made by the GROUP CONCAT. Then run another query which just does the single join between companies and contacts. But it's a bit of a horrible way of doing things.

@ VladSun, I'm getting there! :D

EDIT: Thinking about it, people must be doing far more complicated operations than this and on larger datasets. Here's my record counts:
companies: 36,000
contacts: 38,000
tags: 980
tags_data: 41,000
Pretty tame by MySQL standards!
So there must be some optimisation I can do in that query, and improvement to my tables.

Cheers, B

Re: MySQL, get joined records separated by commas

Posted: Thu Nov 20, 2008 11:36 am
by VladSun
You should make indexies on all fields you use in:
- WHERE clause
- JOIN ON caluse
- ORDER BY clause

You could even make idexies based on two or more columns depending on you queries.
Also, use EXPLAIN (your full query here) to see what's slowing it.

Re: MySQL, get joined records separated by commas

Posted: Thu Nov 20, 2008 3:34 pm
by batfastad
Every field in WHERE, JOIN and ORDER BY is indexed, and still it's around the 82 seconds mark to process the query.
From using EXPLAIN it seems that the query is reading through all 41000 rows from tags_data.

I'd never heard of composite indexes before, but added an index on tags_data with company_id and tag_id and got it down to consistently 31 seconds. Do you think it's still possible to get that quicker?
Or have I reached a limit of MySQL?
Could the query be written more efficiently?

Cheers, B

Re: MySQL, get joined records separated by commas

Posted: Thu Nov 20, 2008 4:04 pm
by VladSun
Have you run:
[sql]EXPLAINSELECT `contacts`.`contact_name`, `companies`.`company_name`, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR '|') AS `classifications`FROM `contacts`LEFT JOIN `companies` USING(`company_id`)LEFT JOIN `tags_data` ON `companies`.`company_id`=`tags_data`.`company_id`LEFT JOIN `tags` ON `tags_data`.`tag_id`=`tags`.`tag_id`WHERE `contacts`.`subscribed`='1'GROUP BY `contacts`.`contact_id`ORDER BY `companies`.`company_name`, `contacts`.`contact_name` ASCLIMIT 0, 10000[/sql]

You may need to tune your MySQL configuration parameters. I had a query which required 60 sec to run - I've been able to make it 10 sec only by increasing the effective_cache_size parameter value (it's PGSQL).

But I have to agree with you, your query is comapretively simple and your tables are not so big - it shouldn't take so much time to execute this query.

Try these and take a notice on their execution time:

[sql]SELECT `contacts`.`contact_name`, `companies`.`company_name`, `tags`.`name` FROM `contacts`LEFT JOIN `companies` USING(`company_id`)LEFT JOIN `tags_data` ON `companies`.`company_id`=`tags_data`.`company_id`LEFT JOIN `tags` ON `tags_data`.`tag_id`=`tags`.`tag_id`WHERE `contacts`.`subscribed`='1'[/sql]

[sql]SELECT `contacts`.`contact_name`, `companies`.`company_name`, `tags`.`name` FROM `contacts`LEFT JOIN `companies` USING(`company_id`)LEFT JOIN `tags_data` ON `companies`.`company_id`=`tags_data`.`company_id`LEFT JOIN `tags` ON `tags_data`.`tag_id`=`tags`.`tag_id`WHERE `contacts`.`subscribed`='1'GROUP BY `contacts`.`contact_id`[/sql]

[sql]SELECT `contacts`.`contact_name`, `companies`.`company_name`, `tags`.`name` FROM `contacts`LEFT JOIN `companies` USING(`company_id`)LEFT JOIN `tags_data` ON `companies`.`company_id`=`tags_data`.`company_id`LEFT JOIN `tags` ON `tags_data`.`tag_id`=`tags`.`tag_id`WHERE `contacts`.`subscribed`='1'ORDER BY `companies`.`company_name`, `contacts`.`contact_name` ASC[/sql]

Re: MySQL, get joined records separated by commas

Posted: Fri Nov 21, 2008 4:17 am
by batfastad
Hi

I ran EXPLAIN on the full query and this was my output:

Code: Select all

1   SIMPLE  contacts    ALL     NULL    NULL    NULL    NULL    37639   Using where; Using temporary; Using filesort
1   SIMPLE  companies   eq_ref  PRIMARY     PRIMARY     4   dwcintranet.contacts.company_id     1    
1   SIMPLE  tags_data   ALL     company_id  NULL    NULL    NULL    41930    
1   SIMPLE  tags    eq_ref  PRIMARY     PRIMARY     4   dwcintranet.tags_data.tag_id    1    
This query took about 82 seconds.

So I added a multiple index to the tags_data table, on tag_id and company_id and the query speed improved to 32 seconds.
So I ran EXPLAIN on the query again and this was my output:

Code: Select all

1   SIMPLE  contacts    ALL     NULL    NULL    NULL    NULL    37639   Using where; Using temporary; Using filesort
1   SIMPLE  companies   eq_ref  PRIMARY     PRIMARY     4   dwcintranet.contacts.company_id     1    
1   SIMPLE  tags_data   index   company_id,companies_tags   companies_tags  26  NULL    41930   Using index
1   SIMPLE  tags    eq_ref  PRIMARY     PRIMARY     4   dwcintranet.tags_data.tag_id    1  
the companies_tags index mentioned above is the new multiple index I created above.

Those 3 sample queries you gave me to check, all took about 32 seconds

I tried creating some more multiple indexes, tag_id and name in tags table, contact_id and company_id in contacts table, but the queries remained at around the 32 seconds mark.

Any ideas?
Or is this a limit of our MySQL server already?
When running these queries the CPU usage on one of the cores maxes out, according to htop. It's a Core 2 Duo with 4GB of RAM so it's a pretty recent build.

Cheers, B

Re: MySQL, get joined records separated by commas

Posted: Fri Nov 21, 2008 6:39 am
by VladSun
Could you post the output from:
[sql]SHOW CREATE TABLE contacts;[/sql]
and do it for all of the rest tables.

Re: MySQL, get joined records separated by commas

Posted: Fri Nov 21, 2008 6:50 am
by batfastad
Ok cool. This should make it easier for you to visualise what's going on!
There's a bunch of other fields and stuff in some of these tables that aren't involved with this specific query.

The tags table is just a list of various classifications: DEALER, MANUFACTURER, DISTRIBUTOR, SHOW EVENT ORGANISER, WEBSITE PROVIDER

This query will only be run ocassionally by me, a few times per month, to export our mailing list data into CSV format to process in Excel. So it's not a mission critical query that will be hammered by users every day. But it's just strange, our DB is tiny compared to many MySQL installations! Our total database is 21MB including indexes.

contacts:

Code: Select all

CREATE TABLE `contacts` ( `contact_id` INT(11) NOT NULL AUTO_INCREMENT, `company_id` INT(11) DEFAULT NULL, `address1_contact` VARCHAR(255) DEFAULT NULL, `address2_contact` VARCHAR(255) DEFAULT NULL, `address3_contact` VARCHAR(255) DEFAULT NULL, `business_card` VARCHAR(2) DEFAULT NULL, `contact_id_old` INT(20) DEFAULT NULL, `contact_name` VARCHAR(255) DEFAULT NULL, `county_state_contact` VARCHAR(100) DEFAULT NULL, `create_name` VARCHAR(60) DEFAULT NULL, `create_stamp` INT(10) DEFAULT NULL, `email_contact` VARCHAR(255) DEFAULT NULL, `email_contact_optout` TINYINT(1) DEFAULT NULL, `fax_contact` VARCHAR(255) DEFAULT NULL, `geo_contact` VARCHAR(255) DEFAULT NULL, `job_title` VARCHAR(255) DEFAULT NULL, `mail_contact` TINYINT(1) DEFAULT NULL, `modify_count` INT(7) DEFAULT '0', `modify_name` VARCHAR(60) DEFAULT NULL, `modify_stamp` INT(10) DEFAULT NULL, `note` TEXT, `postcode_zip_contact` VARCHAR(50) DEFAULT NULL, `reg_stamp` INT(10) DEFAULT NULL, `subscribed` TINYINT(1) DEFAULT NULL, `tel_cell` VARCHAR(255) DEFAULT NULL, `tel_contact` VARCHAR(255) DEFAULT NULL, `tel_ext` VARCHAR(255) DEFAULT NULL, `tel_home` VARCHAR(255) DEFAULT NULL, `temp` VARCHAR(50) DEFAULT NULL, `town_city_contact` VARCHAR(100) DEFAULT NULL, `view_count` INT(7) DEFAULT '0', `view_name` VARCHAR(60) DEFAULT NULL, `view_stamp` INT(10) DEFAULT NULL, PRIMARY KEY  (`contact_id`), KEY `company_id` (`company_id`), KEY `contact_name` (`contact_name`)) ENGINE=MyISAM AUTO_INCREMENT=37645 DEFAULT CHARSET=utf8
companies:

Code: Select all

CREATE TABLE `companies` ( `company_id` INT(11) NOT NULL AUTO_INCREMENT, `address_priv` TINYINT(1) DEFAULT NULL, `address1` VARCHAR(255) DEFAULT NULL, `address2` VARCHAR(255) DEFAULT NULL, `address3` VARCHAR(255) DEFAULT NULL, `advisory` VARCHAR(255) DEFAULT NULL, `advisory_name` VARCHAR(60) DEFAULT NULL, `advisory_stamp` INT(10) DEFAULT NULL, `agency_address` VARCHAR(255) DEFAULT NULL, `agency_company` VARCHAR(255) DEFAULT NULL, `agency_contact` VARCHAR(255) DEFAULT NULL, `agency_email` VARCHAR(255) DEFAULT NULL, `agency_fax` VARCHAR(255) DEFAULT NULL, `agency_id` INT(11) DEFAULT NULL, `agency_tel` VARCHAR(255) DEFAULT NULL, `budget_year1` VARCHAR(3) DEFAULT NULL, `budget_year2` VARCHAR(3) DEFAULT NULL, `company_name` TEXT, `county_state` VARCHAR(100) DEFAULT NULL, `create_name` VARCHAR(60) DEFAULT NULL, `create_stamp` INT(10) DEFAULT NULL, `currency` VARCHAR(3) DEFAULT NULL, `email` TEXT, `email_optout` TINYINT(1) DEFAULT NULL, `email_priv` TINYINT(1) DEFAULT NULL, `fax` VARCHAR(255) DEFAULT NULL, `fax_priv` TINYINT(1) DEFAULT NULL, `geo` VARCHAR(2) DEFAULT NULL, `hanging_file` TINYINT(1) DEFAULT NULL, `iso_code` VARCHAR(20) DEFAULT NULL, `locked_name` VARCHAR(60) DEFAULT NULL, `locked_stamp` INT(10) DEFAULT NULL, `master_id` INT(11) DEFAULT NULL, `modify_count` INT(7) DEFAULT '0', `modify_name` VARCHAR(60) DEFAULT NULL, `modify_stamp` INT(10) DEFAULT NULL, `postcode_zip` VARCHAR(50) DEFAULT NULL, `rep` VARCHAR(10) DEFAULT NULL, `tel` VARCHAR(255) DEFAULT NULL, `tel_priv` TINYINT(1) DEFAULT NULL, `town_city` VARCHAR(255) DEFAULT NULL, `vat_number` VARCHAR(20) DEFAULT NULL, `view_count` INT(7) DEFAULT '0', `view_name` VARCHAR(60) DEFAULT NULL, `view_stamp` INT(10) DEFAULT NULL, `website` TEXT, PRIMARY KEY  (`company_id`), KEY `geo` (`geo`), FULLTEXT KEY `company_name` (`company_name`)) ENGINE=MyISAM AUTO_INCREMENT=39200 DEFAULT CHARSET=utf8
tags_data:

Code: Select all

CREATE TABLE `tags_data` ( `tagdata_id` INT(11) NOT NULL AUTO_INCREMENT, `tag_id` INT(11) DEFAULT NULL, `company_id` VARCHAR(6) DEFAULT NULL, `create_name` VARCHAR(60) DEFAULT NULL, `create_stamp` INT(10) DEFAULT NULL, PRIMARY KEY  (`tagdata_id`), KEY `tag_id` (`tag_id`), KEY `company_id` (`company_id`), KEY `companies_tags` (`company_id`,`tag_id`)) ENGINE=MyISAM AUTO_INCREMENT=42205 DEFAULT CHARSET=utf8
tags:

Code: Select all

CREATE TABLE `tags` ( `tag_id` INT(7) NOT NULL DEFAULT '0', `category` VARCHAR(255) DEFAULT NULL, `description` VARCHAR(255) DEFAULT NULL, `name` VARCHAR(255) DEFAULT NULL, PRIMARY KEY  (`tag_id`), KEY `category` (`category`), KEY `name` (`name`)) ENGINE=MyISAM DEFAULT CHARSET=utf8
Thanks so much! :D

Re: MySQL, get joined records separated by commas

Posted: Fri Nov 21, 2008 7:18 am
by VladSun
First - you don't have an index on `contacts`.`subscribed` and you are using it in WHERE clause.

Try to remove the FULLTEXT feature from KEY `company_name` (`company_name`).

Re: MySQL, get joined records separated by commas

Posted: Fri Nov 21, 2008 7:27 am
by batfastad
Index added for contacts.subscribed
Index removed on companies.company_name. Since companies.company_name is a text field, not varchar, phpMyAdmin only allows me to create FULLTEXT indexes on that column. Not a regular index.

Re-ran the query and it was still at the 31s mark. Strange for only 777 records!
Thanks for taking the time to look, I hope we can get this solved :D

Re: MySQL, get joined records separated by commas

Posted: Fri Nov 21, 2008 7:40 am
by VladSun
batfastad wrote:Index removed on companies.company_name. Since companies.company_name is a text field, not varchar, phpMyAdmin only allows me to create FULLTEXT indexes on that column. Not a regular index.
Make it varchar (I don't think you need a text field here) and recreate the index.

Re: MySQL, get joined records separated by commas

Posted: Fri Nov 21, 2008 7:59 am
by VladSun
The main problem we have with this query is that you have two different keys in the ORDER BY clause - so, no indexies could be used.
If you can remove the ORDER BY clause or at ORDER BY only one key it would be far better.

Try also this query:
[sql]SELECT `contacts`.`contact_name`, `companies`.`company_name`, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR '|') AS `classifications`FROM `contacts`    force INDEX (PRIMARY, `contact_name`, `company_id`, `IX_subscribed`)LEFT JOIN `companies` USING(`company_id`)LEFT JOIN `tags_data` ON `companies`.`company_id`=`tags_data`.`company_id`LEFT JOIN `tags` ON `tags_data`.`tag_id`=`tags`.`tag_id`WHERE `contacts`.`subscribed`='1'GROUP BY `contacts`.`contact_id`LIMIT 0, 10000[/sql]