Page 1 of 2
Inventory Conundrum [Solved]
Posted: Fri Aug 31, 2007 3:55 pm
by Begby
I need to store inventory in a warehouse and keep a history of what happens to the 'stuff' and currently us a table something like this (simplified for brevity)
tblItemHistory
-------
itemID
transActionType - The type of transaction
change - The amount the stock changed
inStock - The total amount in inventory after the change
A sample set of records might look like this
itemID transActionType Change inStock
10 Created 0 0 (Created the item)
10 InvReceived +50 50 (received 50 units, so now we have 50 in stock)
10 Order -4 46 (someone placed an order for 4, so now we have 46 in stock)
10 Order -1 45 (someone place an order for 1, so now we have 45 in stock)
10 Canc +1 46 (someone cancelled their order, so now we have 46 in stock)
That is pretty simple, its just like a rolling total like your checkbook. Now here is where the problem begins, we need to store a single product across multiple warehouses. So my plan is to add in a warehouseID field. The stock will need to be separate for each warehouse kinda like this.
itemID warehouseID transActionType Change inStock
10 1 AddedToWarehouse 0 0 (added item to warehouse 1 location)
10 1 InvReceived +20 20 (warehouse 1 received 20 items, so now 20 in stock)
10 1 Order -2 18 (warehouse 1 sold 1 item so now we have 18 in stock at warehouse 1)
10 2 AddedToWarehouse 0 0 (added item to wareshouse 2 location)
10 2 InvReceived +50 50 (warehosue 2 received 50 items, so now we have 50 at warehouse 2 and 68 total)
10 2 Order -4 46 (warehouse 2 sold 4 items so now we have 46 at warehouse 2 and 64 total)
10 1 Order -1 17
10 1 Order -1 16 (and so on)
Ok, so that should hopefully still be pretty clear. If someone at warehouse 1 needs to see the history for an item I can just pull out the warehouse 1 records and display the records. That is all that someone at warehouse will care about is the stuff in that warehouse.
The conundrum is this. The customer doesn't care what the stock or running total is at any given warehouse, all they want to see is the total amount of product that we have of theirs and the orders that were placed etc. So for them the running total should look like this:
transactionType change inStock
----
Created 0 0
InvReceived +20 20
Order -2 18
Inv Received +50 68
Order -4 64
Order -1 63
Order -1 62
Anyone have any idea how to construct a query to obtain the above? I am free to completely redesign the table structure if necessary.
Posted: Fri Aug 31, 2007 9:38 pm
by califdon
As you said, you simplified the table structure, for example I presume you'd have a date field for each transaction, and I'm assuming that the itemID identifies a product/item. So where's the problem? Wouldn't you simply query where itemID=desired item history? Now, if you want to see all items for a particular customer, you'd need to either add a column for customerID or do a Join with the customer-to-items table. If you needed a running balance on each item, you should be able to get that by combining all the warehouse transactions (transfers would balance out), although I haven't tried to construct the query, it seems to me it should be straightforward. Am I missing what it is you want to do?
Posted: Sat Sep 01, 2007 6:31 am
by Begby
It is a little more difficult than that. Try and construct the query. Keep in mind one warehouse might have 10 transactions the other one may have 500. The customer wants to see every transaction and a running total of the inventory from *all* warehouses combined.
This really rears its ugly head when it scales. Right now we have some items with 50,000 history records. We display a page of around 200 at a time. With multiple warehouses it is not efficient to query all 50,000 records to derive a running total, then display only 200 of them. You can determine the total at a all warehouses at a given point in time by adding up the totals of the most recent history items. But what if all the records on the page only come from one warehouse, I would still need to query back in time to find the nearest previous record from another warehouse to get a running total to start with.
I am sure my previous paragraph makes no sense. Lets look at a snapshot.
At a given point in time you might have these records all in a row
itemID warehouseID change total
10 1 -4 50
10 2 -3 10
10 2 -1 9
10 2 -2 7
At the last record we can deduce that the total in inventory is 57 by adding the total from warehouse 2 to the most recent total from warehouse 1. Now lets say we want to do a query and retrieve 4 most recent records with a running total from all warehouses and I get these records
itemID warehouseID change total
10 2 -2 30
10 2 -1 29
10 2 -2 27
10 2 -1 26
Notice no records from warehouse 1 were returned in this query. How can I calculate the running total without querying all the records?
Also this might involve more that 2 warehouses at a time.
Posted: Sat Sep 01, 2007 11:01 am
by josa
I must ask: why is total stored in the transaction table? Shouldn't it be in some kind of inventory table? Are you showing us the whole transaction table or are there more columns? I'm missing a timestamp for example. It would be very hard to find the latest transaction without one, or am I missing something?
/josa
Posted: Sat Sep 01, 2007 11:58 am
by Begby
There are other things in there too. Including a timestamp, hurt inventory, hold inventory, orderID, notes etc. I left fields out though to keep it simpler as i said in the OP.
There is a location table that holds the total stored in each warehouse that is related to the item which in this case is the inventory table (as items can be in more than one warehouse). The total is in the transaction table since it is a running history and there are thousands of records for each item, and we have thousands of items. The total for a given location and a given transaction could be calculated by taking the total on hand and adding it to every transaction back to a specific date, however that might be somewhat intensive as there are 3 million records in the table and its growing by over a million a year. Getting the total for a specific set of history items would involve summing up thousands of records for each record returned. But I am not opposed to doing that if it is the best way.
Another reason that we have the total in the history table though is for consistency checks. Sometimes things go awry and someone writes a bad query or something, using the history we can check for discrepancies in the final inventory total. This should go away however in the rewrite of the system that I am currently doing.
Posted: Sat Sep 01, 2007 12:54 pm
by josa
Ok, I'm going to try to explain it in my own words to see if I understand:
- You have items of a product that is spread across several warehouses.
- In the transaction table you can see the running total for that product for a given warehouse by looking at the last transaction for that warehouse and product.
- You want to get the total from all warehouses for a product by looking at the last transaction for all warehouses for that product.
I'm sorry if I've only confused the matter further.
/josa
Posted: Sat Sep 01, 2007 1:54 pm
by califdon
Hi again, Begby. I think Josa and I are having a similar problem understanding what you're really trying to do. Maybe you're so close to your problem that you're perspective is different than ours. My interpretation of what you have said is that it's a matter of scaling and efficiency, not a matter of logic. Is that correct?
As a general comment, 50,000 records covering transactions at 10 warehouses doesn't strike me as a particularly large-scale database. Although that's a little larger than databases that I deal with, I know dozens of developers who routinely deal with millions of records in queries. Have you actually experienced performance problems trying to employ straightforward queries?
Posted: Sat Sep 01, 2007 5:42 pm
by Begby
Sorry for the confusion.
The reason I brought up efficiency is why I was talking about why I had the total inventory stored in the transaction table. The real problem though is a logic problem. So ignore the efficiency fo rnow.
With the table I have as an example, I don't know how to read out the records to display each transaction with a running grand total across all warehouses. I know how to get a grand total of the in stock simply by reading out the newest record for each item/warehouse and adding them up.
The problem arises when I want to view a set of transactions for a given warehouse. Each transaction needs to show the grand total across all warehouses for after the transaction is complete independent of the warehouse. A customer should be able to view the history page by page, or jump to a date range, and not the entire history on one page (as the history is way to long to display at once). The result set should look like this as an example (but I'll be pulling probably 200 records at a time)
TransactionType DateTime Change Grand Total
Order 10-1-2006 -1 354
Order 10-1-2006 -1 353
Order 10-2-2006 -2 351
Return 10-2-2006 +1 352
Order 10-2-2006 -1 351
Now if I have 50,000 records spanning 3 years of history for one item going all the way back from today, how do I pull out the above 5 records and calculate the grand total across all warehouses for each transaction? This is what I don't know how to do, perhaps I am missing something that is obvious to one of you. It seemed simple enough at first, but as soon as I tried thinking of a query I got stumped.
Keep in mind that the grand total is a sum of all the warehouses.
Thansk for all your help.
Posted: Sun Sep 02, 2007 1:36 am
by califdon
OK, now I think I get it. Yes, because you want a running inventory and you have multiple warehouses, that does make it difficult--given your schema. That may be the key issue. I have a colleague who frequently says, When you just can't see a way clear to extract the data you want, it's usually because the database isn't structured so that you can do it! If your need to make data available in that form is important, which it seems to be, you may find it necessary to consider restructuring your database.
It seems to me the basic issue is that, although inventory and transactions are actually warehouse-specific, your output data requirement is not.
I can imagine a denormalized structure something like this, that might make it possible:
tblInventory:
ID
PartNo
...
OH_Whse1_Bal
OH_Whse2_Bal
OH_Whse3_Bal
...
LastTransDate
and
tblTransactions:
TransID
TransDate
TransType (recvd, sold, scrapped, xfer in, xfer out)
PartNo
Qty
Whse
That would allow you to select a beginning reporting period where you would have balances in all warehouses in one record for a part (the date would need to correspond with the LastTransDate, somehow), then you could select all transactions for the part since that date, or between two dates. In the report, each successive chronological transaction would adjust a balance shown on the detail report line under the appropriate Whse column.
It's ugly, but it might work. Ordinarily I don't like to denormalize tables like that, but I can't think of any better way to do it, right now. What I did, as you can see, was to pool all the Whse inventory into a single, denormalized record, with a last transaction date, at which time those balances were correct. Ughhh! Now I see that that would require a new inventory record for every transaction, besides the transaction record, itself! Not pretty!
OK, but maybe everything could be combined in one monster transaction record, then? In effect, you may have to create a journaling system. There would be a lot of redundant data (every record would carry balances that were not affected by the transaction--I don't like that!), but then the report would be easy.
I'm starting to get rummy on this, so I'll quit. Maybe something I've said might start your thinking along a new line, though, and that might be helpful.
Posted: Sun Sep 02, 2007 1:37 am
by califdon
OK, now I think I get it. Yes, because you want a running inventory and you have multiple warehouses, that does make it difficult--given your schema. That may be the key issue. I have a colleague who frequently says, When you just can't see a way clear to extract the data you want, it's usually because the database isn't structured so that you can do it! If your need to make data available in that form is important, which it seems to be, you may find it necessary to consider restructuring your database.
It seems to me the basic issue is that, although inventory and transactions are actually warehouse-specific, your output data requirement is not.
I can imagine a denormalized structure something like this, that might make it possible:
tblInventory:
ID
PartNo
...
OH_Whse1_Bal
OH_Whse2_Bal
OH_Whse3_Bal
...
LastTransDate
and
tblTransactions:
TransID
TransDate
TransType (recvd, sold, scrapped, xfer in, xfer out)
PartNo
Qty
Whse
That would allow you to select a beginning reporting period where you would have balances in all warehouses in one record for a part (the date would need to correspond with the LastTransDate, somehow), then you could select all transactions for the part since that date, or between two dates. In the report, each successive chronological transaction would adjust a balance shown on the detail report line under the appropriate Whse column.
It's ugly, but it might work. Ordinarily I don't like to denormalize tables like that, but I can't think of any better way to do it, right now. What I did, as you can see, was to pool all the Whse inventory into a single, denormalized record, with a last transaction date, at which time those balances were correct. Ughhh! Now I see that that would require a new inventory record for every transaction, besides the transaction record, itself! Not pretty!
OK, but maybe everything could be combined in one monster transaction record, then? In effect, you may have to create a journaling system. There would be a lot of redundant data (every record would carry balances that were not affected by the transaction--I don't like that!), but then the report would be easy.
I'm starting to get rummy on this, so I'll quit. Maybe something I've said might start your thinking along a new line, though, and that might be helpful.
I'm wide awake now!!!
Posted: Sun Sep 02, 2007 2:42 am
by califdon
OK, so I couldn't get to sleep, churning your puzzle over in my head!
Isn't your real problem getting the starting total balances for each part, from which to calculate running balances as you process transactions? If you could easily get the starting balances, the rest would be straightforward, right?
So "divide and conquer", focus on how to get those efficiently.
How about a cron job that runs once a month (or week), calculating the total balances (or for each warehouse, if you need that) for each part and writing one record per part to an auxiliary table (by that, I mean not one of the normal tables involved with inventory transactions). Your real inventory system could be a standard schema for inventory in multiple warehouses, but you'd have one extra table containing "snapshots" of total inventory on a periodic schedule. That extra table would only be involved once a month to add new records, and whenever a report was needed.
Now, with the starting balances for every part on selected dates, the report should be easy. It's a pain to run a cron job and maintain a separate table, but on the other hand, the rest of your database could be plain vanilla and your report queries could be very simple.
The monthly "snapshot" script could just keep calculating from the previous snapshot, each time.
What do you think?
Posted: Sun Sep 02, 2007 10:13 am
by Begby
Ahh now you see my dillema. Yes this is a difficult problem indeed. I assumed that you would have the answer so it must be extra difficult.
I will take your idea into consideration along with the possibility of changing the schema. Since I am in the midst of a total rewrite of the system I am free to take my time and get it right, I certainly don't want to have to change this again later.
Let me know if you have any other ideas.
Thank you very much for your help and suggestions.
Posted: Sun Sep 02, 2007 10:17 am
by Begby
Or wait, you did come up with a point. The real problem is calculating the total balance at the beginning of the snapshot. I could always do two queries or a subquery.
So if I know that the item is in 3 warehouses, warehouses 1, 3, and 6, and that the first record in my snapshot is on datetime Y, I would do this in pseudo code
select the sum of the inventories for transactions for warehouses 1, 3, and 6 for the most recent record for each that is less than or equal to datetime Y.
If I can figure out that SQL then that is my starting inventory for the snapshot and I could then use that to calculate the running total.
Does that sound correct?
Posted: Sun Sep 02, 2007 2:35 pm
by califdon
Begby wrote:Or wait, you did come up with a point. The real problem is calculating the total balance at the beginning of the snapshot. I could always do two queries or a subquery.
So if I know that the item is in 3 warehouses, warehouses 1, 3, and 6, and that the first record in my snapshot is on datetime Y, I would do this in pseudo code
select the sum of the inventories for transactions for warehouses 1, 3, and 6 for the most recent record for each that is less than or equal to datetime Y.
If I can figure out that SQL then that is my starting inventory for the snapshot and I could then use that to calculate the running total.
Does that sound correct?
I assumed that you would have the answer
My old grandfather always told me "Never assume anything!"
Well, I'm not sure I have absorbed what you said above (it's hell, getting old!), but my most recent suggestion was to actually store, say, a monthly snapshot of
all parts in
all warehouses, in an auxiliary table. The idea is that you could pre-calculate balances in a batch job, creating a "snapshot" table that would greatly simplify the queries to generate customer reports on demand. You would store the balances for, say, the first of every month, so you would only have to select a partnumber and a beginning date, get the balances, and then grab all the transactions for that partnumber subsequent to that beginning date. I don't think that's what you were trying to do in your pseudo code.
Here's my thinking: it looks like a complete retrieval of the data needed for the
ad hoc customer reports would require a
lot of data fetching and filtering, probably resulting in a slow and difficult operation. Since I assume (sorry, grandpa!) that this requirement would be frequent and unpredictable, why not do as much of the work
in advance and store it in another table? My old programming instructor used to call that "divide and conquer." You trade the burden of a once-a-month batch process for the simplicity and speed of the more frequent and more difficult report processing.
Anyway, it's a way of trying to "think outside the box" or avoid getting your mind stuck in more standard approaches. Hope it stimulates some new ideas for you.
Posted: Sun Sep 02, 2007 6:09 pm
by Begby
The answer is like I said I think, let me explain by showing it to you like this. Lets say you have the following records in a the transaction table for the a single item that is stored in three warehouses. I have ommitted the date, but assume they are in order by date
item whse change total
10 1 -2 40
10 1 -1 39
10 3 -1 48
10 3 -1 47
10 6 -1 18
10 1 -1 38
10 1 -2 36
10 3 -2 45 <--
Notice the record at the arrow. Lets say I want to know the grand total at that point in time. All I need to do is select the most recent records from the other warehouses before that one and add them up, in this case those records are
10 6 -1 18
10 1 -2 36
So I add those up and get 54, now I take that and add it the record with the arrow and get 99. So that means my total in stock at the point in time with the arrow is 99.
Now that I can obtain the total in stock from any single point in time I can use that with a snapshot to easily calculate the grand totals by adding in the change values.
I just gotta come up with the query now.