Page 1 of 1

Oh please, some help on this SQL! :(

Posted: Tue Sep 12, 2006 12:58 am
by robster
Hi all, I am desperate here, I have finally hit the limit of my SQL knowlege I believe. I am ending up with a combination of looping queries on my Point of Sale database that just took up over 2.5 Gig of RAM and I had to crash the app to stop it.

Note, the database is MySQL

Let me explain:

I am trying to do a report on the transactions in the database where I do a breakdown by groups. The way I have my database tables setup is roughly this:

transactions table
This table contains EVERY single transaction in the system, even if a client purchases three items, they all go in as individual transaction data, so you can see to the minutest detail.
It has these records:

Code: Select all

id   	int(11) 
type  	varchar(50)
type_id 	varchar(10)
price  	varchar(10)
date
notes  	varchar(255)
cust_id  	int(11)
employee_id  	int(11)
pos_history_id  	int(11)
and contains data similar to this:

Code: Select all

1  	services  	2  	38  	2005-03-31  	First Client, free gift of Ready to Wear  	2  	2  	1
2 	stock 	20 	0 	2005-03-31 	First Client, free gift of Ready to Wear 	2 	2 	1
3 	services 	2 	15 	2005-03-31 	Colour did not take, not charged for it, to rebook... 	3 	2 	1
4 	services 	7 	0 	2005-03-31 	Colour did not take, not charged for it, to rebook... 	3 	2 	1
5 	services 	3 	19 	2005-03-31 	  	9 	2 	1
6 	services 	2 	38 	2005-03-31 	Gift Voucher used ($38) 	16 	2 	1
7 	services 	4 	7 	2005-03-31 	  	6 	2 	1
8 	services 	2 	38 	2005-03-31 	  	6 	2 	1
9 	services 	2 	38 	2005-03-31 	  	12 	2 	1
10 	services 	6 	27 	2005-03-31 	  	12 	2 	1
11 	services 	2 	38 	2005-03-31 	  	13 	2 	1
12 	services 	7 	62 	2005-03-31 	  	13 	2 	1
13 	stock 	32 	22.95 	2005-03-31 	  	13 	2 	1
The Type is a reference that allows me to know if the transaction details are in a service or a stock table. The Type_Id refers to the exact item in either the stock or services table. Not such an issue for my question, but you might be interested :) IE: Type_Id '2' in Type 'Stock' might be Shampoo for example.

There are nearly 7000 transactions in that table.


grouping table
This table contains the NAMES of the groups

It has these records:

Code: Select all

group_id	group_name	group_type
13 	Hulabaloo 		stock
14 	Beauty Products 	stock
15 	Beauty Services 	services
16 	Beauty Rent 		other
17 	Cutting 		services
18 	Colour 			services
19 	Training 		services
20 	Treatments 		services
25	Makeup 			stock
26	Makeup Service 		services
Now these are purely descriptors for the next table...


groups table
This table contains the actual group ITEMS.
the table looks like this:

Code: Select all

id	group_id 	type	type_id
97 	13 		stock 		116
98 	14 		stock 		53
108 	15 		services 	31
99 	15 		services 	24
67 	15 		services 	22
87 	15 		services 	25
72 	15 		services 	27
71 	15 		services 	23
70 	15 		services 	21
69 	15 		services 	20
68 	15 		services 	26
100 	16 		other 		1
107 	17 		services 	14
106 	17 		services 	30
105 	17 		services 	32
104 	17 		services 	1
The group_id here matches the grouping.group_id so we know which item is associated with which group name from the grouping table.


Now that you have the info, here is what I'm trying to do... get ready and I hope you can stay with me :)

I am trying to have a report that looks through the transactions table and displays the totals etc, based on the groups that the user has created. At the end of each days worth of data collection (and there may be months worth of data, looped through) I would expect the report to show each of the groups, ie:

Code: Select all

Hulabaloo, Beauty Products, Beauty Services, Beauty Rent, Cutting, Colour, Training, Treatments, services, Makeup, Makeup Service
they would all have their names, and their total figures beside them. Example might include:

Code: Select all

Beauty Products:  14 Sold, $250 total, 15 July 2006
(as a rough example).

Now you know what I WANT to do, I'd like to show you what I did do and how much of a workout it gives my database (and hence the machine). Using this method below, the machine ground to a halt for minutes on a FOUR day query (which might have been about 120 transactions total).


THE CURRENT CONCEPT:

1 loop through transactions table within a defined daterange (say, 1st of September to Today)

2 within that loop, loop through grouping table, one item at a time

3 for each grouping item, loop through the groups tables find a match of group_id's so we know the individual items we'll be searching for in the transaction table for that day

4 for each matching item in the grouping/groups tables (step 3), Loop through the transactions table for just that day we are in in loop 1 (transactions) and display the results.

Another way to say it:
IE, it might work like this:

Code: Select all

-Loop through transaction table, so at record 1, we:
--jump into the grouping table, so at record 1 of grouping we:
---jump into the groups table, so at record 1 of groups (if a match) we:
----jump into the transactions table for just that day and we
-----draw each record on screen that matches our criterea (ie:  has same type and type_id as the groups table item)
So you can see, for EVERY transaction in the transactions table (and let's say we do a months worth, which could easily be 500 transactions a month) we loop into many subloops below it, considering there are roughly
-10 grouping names (500 x 10 = 5000 database calls)
-about 80 group items (5000 x 80 = 400,000 database calls)
-about 25 transactions in a day say (400,000 x 25 = 10,000,000 database calls)
and you can see why the database is crawling to a halt.


Now maybe my logic is not quite correct with my above math, but something is happening and it's killing it!

How do I go about making it work more efficiently. I am wondering if joins are anything to do with it, and if so, lord help me :) I've tried and tried them and it's like a mental block.

But I'm open to any help as I'm desperate to get this report working. There is a bookkeeper going NUTS for the report, and a taxman going even more nuts :)

Any help GREATLY appreciated and thank you so much for reading this.


Rob

Posted: Tue Sep 12, 2006 2:58 am
by volka
  • Why do both fields, grouping.group_type and groups.type, hold the same information? Seems pretty redundant to me.
  • What indices did you create for the tables? In case of doubt run SHOW CREATE TABLE nameOfTable for all three tables.
  • Does transactions.type_id <->groups.type_id cleary identify an item? Can there be a different item with that type_id (in other words: is only the combined key type+type_id unambigious)?
It can be done much easier with GROUP BY, Sum() and Count(). Do you know about these sql features?

Posted: Tue Sep 12, 2006 3:37 am
by robster
I'll work through your repsonce, and thank you SO much for your responce :)

Code: Select all

SHOW CREATE TABLE grouping
grouping  	[BLOB - 236 Bytes]

SHOW CREATE TABLE groups
groups  	[BLOB - 251 Bytes]

SHOW CREATE TABLE transactions
transactions  	[BLOB - 479 Bytes]

Both fields, grouping.group_type and groups.group_type are only the same for the actual creation and display of grouping. It's co-incidental that they are both there. I use it so when I create the grouping 'group name' it appends the group type (stock or service) so it knows what kind of group it is. We need not concern ourselves with it in this example. It may be redundant? but it worked for me :)


In relation to type and type_id, yes, you need to compare the type (stock or service) to know which table to look up the type (integer that represents the actual item). For example, type 'stock' type_id '2' could return SHAMPOO but type 'service' type_id '2' could return WOMENS HAIRCUT.

I hope that made sense.

Regarding my knowlege of GROUP BY, Sum() and Count(), I vaguely know these, but looking at them I can guess their usage, though am guessing that GROUP BY will be the powerful one, but I don't know what one at all..

In what way would you imagine me using them?

Thank you again so VERY much!!!

:)

Rob

Posted: Tue Sep 12, 2006 8:21 am
by volka
grouping [BLOB - 236 Bytes]
Doesn't answer What indices did you create for the tables?. The contents of these blobs does.

First I would extract all type/type_id/group_type/etc and move it to another table items. This reduces it to a single comparison of numbers for the main data; much easier and faster.

Code: Select all

CREATE TABLE `items` (
  `item_id` int(11) unsigned NOT NULL auto_increment,
  `item_name` varchar(64) NOT NULL default '0',
  `type` int(11) unsigned NOT NULL default '0',
  `type_id` int(11) unsigned NOT NULL default '0',
  PRIMARY KEY  (`item_id`)
);

CREATE TABLE `grouping` (
  `group_id` int(11) unsigned NOT NULL auto_increment,
  `group_name` varchar(64) NOT NULL,
  PRIMARY KEY  (`group_id`)
);

CREATE TABLE `groups` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `item_id` int(11) unsigned NOT NULL default '0',
  `group_id` int(11) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `itemIdx` (`item_id`),
  KEY `groupIdx` (`group_id`)
);

CREATE TABLE `transactions` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `item_id` int(11) unsigned NOT NULL default '0',
  `price` int(11) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `itemIdx` (`item_id`)
);

Code: Select all

SELECT
	grouping.group_name, Sum(transactions.price), Count(transactions.price)
FROM
	transactions
JOIN
	groups
ON
	transactions.item_id=groups.item_id
JOIN
	grouping
ON
	groups.group_id=grouping.group_id
GROUP BY
	groups.group_id

test data:

Code: Select all

INSERT INTO `items` VALUES (1,'shampoo',1,2);
INSERT INTO `items` VALUES (2,'women\'s haircut',2,2);
INSERT INTO `items` VALUES (3,'service A',2,1);
INSERT INTO `items` VALUES (4,'service B',2,3);
INSERT INTO `items` VALUES (5,'stock A',1,1);
INSERT INTO `items` VALUES (6,'stock B',1,9);
INSERT INTO `items` VALUES (7,'service D',2,5);
INSERT INTO `items` VALUES (8,'service E',2,6);
INSERT INTO `items` VALUES (9,'service F',2,7);
INSERT INTO `items` VALUES (10,'stock C',1,11);
INSERT INTO `items` VALUES (11,'stock D',1,12);
INSERT INTO `items` VALUES (12,'stock E',1,13);
INSERT INTO `items` VALUES (13,'stock F',1,20);
INSERT INTO `items` VALUES (14,'stock G',1,21);

INSERT INTO `grouping` VALUES (1,'groupA');
INSERT INTO `grouping` VALUES (2,'groupB');
INSERT INTO `grouping` VALUES (3,'groupC');

INSERT INTO `groups` VALUES (1,1,3);
INSERT INTO `groups` VALUES (2,5,3);
INSERT INTO `groups` VALUES (3,6,3);
INSERT INTO `groups` VALUES (4,10,3);
INSERT INTO `groups` VALUES (5,11,2);
INSERT INTO `groups` VALUES (6,12,2);
INSERT INTO `groups` VALUES (7,13,2);
INSERT INTO `groups` VALUES (8,14,2);
INSERT INTO `groups` VALUES (9,2,1);
INSERT INTO `groups` VALUES (10,3,1);
INSERT INTO `groups` VALUES (11,4,1);
INSERT INTO `groups` VALUES (12,7,1);
INSERT INTO `groups` VALUES (13,8,1);
INSERT INTO `groups` VALUES (14,9,1);

INSERT INTO `transactions` VALUES (1,1,3);
INSERT INTO `transactions` VALUES (2,2,6);
INSERT INTO `transactions` VALUES (3,3,9);
INSERT INTO `transactions` VALUES (4,4,12);
INSERT INTO `transactions` VALUES (5,5,15);
INSERT INTO `transactions` VALUES (6,6,18);
INSERT INTO `transactions` VALUES (7,7,21);
INSERT INTO `transactions` VALUES (8,8,24);
INSERT INTO `transactions` VALUES (9,9,27);
INSERT INTO `transactions` VALUES (10,10,30);
INSERT INTO `transactions` VALUES (11,11,33);
INSERT INTO `transactions` VALUES (12,12,36);
INSERT INTO `transactions` VALUES (13,13,39);
INSERT INTO `transactions` VALUES (14,14,42);
INSERT INTO `transactions` VALUES (15,1,3);
INSERT INTO `transactions` VALUES (16,3,9);
INSERT INTO `transactions` VALUES (17,5,15);
INSERT INTO `transactions` VALUES (18,7,21);
INSERT INTO `transactions` VALUES (19,9,27);
INSERT INTO `transactions` VALUES (20,11,33);
INSERT INTO `transactions` VALUES (21,13,39);
INSERT INTO `transactions` VALUES (22,1,3);
INSERT INTO `transactions` VALUES (23,5,15);
INSERT INTO `transactions` VALUES (24,9,27);
INSERT INTO `transactions` VALUES (25,13,39);
INSERT INTO `transactions` VALUES (26,1,3);
INSERT INTO `transactions` VALUES (27,13,39);

Posted: Tue Sep 12, 2006 9:26 am
by GM
I agree with volka - your database design seems to have a couple of problems, which can be resolved by following his suggestions of removing redundant data.

You say you've got 7000 transactions in your transaction table, but this is not a large amount of rows. 2.5 GB of RAM used indicates something is not good...

I'm finding it difficult to understand (despite your detailed post) what you are trying to achieve...

Do you want simply group name, service, date, total sold?

You need to look at the following things, in detail.

1) Joining database tables - generally it is not a good idea to loop through tables using the results from one query as inputs to the next. This is very processor intensive, and can usually be done in a single query.

2) Indexes on your tables - if you are joining on a non primary key column, often it is a good idea to create an index on that column. Indexes can speed up queries enormously.

Have you tried a query that looks a bit like:

Code: Select all

SELECT grouping.group_name, transactions.`date`, count(transactions.price) as trans_count, sum(transactions.price) as total_price
FROM transactions, grouping, groups
WHERE transactions.type = groups.type
AND transactions.type_id = groups.type_id
AND groups.group_id = grouping.group_id
AND transactions.`date` BETWEEN ***first_input_date*** AND ***last_input_date***
GROUP BY grouping.group_name, transactions.`date`
ORDER BY grouping.group_name, transactions.`date` DESC
Note: I haven't tested this in the slightest, but it looks like it might be roughly what you are after.

Also - this is a pet hate of mine - having columns in your tables called "date" (or any other reserved SQL word for that matter) is asking for grief.

EDIT - changed query slightly.

Posted: Tue Sep 12, 2006 10:46 am
by RobertGonzalez
Will this be done strictly from the database or are you hitting this query through server-side code?