Page 1 of 2

Store data + advice

Posted: Wed Dec 28, 2005 7:09 pm
by Jim_Bo
Hi,

A few questions on the best way to go ..

I have a menu system structured like so which is populated fro 2 tables in the database:

T-Shirts
- ACDC
- Eminem
- D12
Pants
Jackets

t-Shirts is the selected menu in this case ..

Now to add products to these subcat's I assume I need to introduce 3rd table .. so far I have "maincat", "subcat" and I guess I add the third "products"

Whats the best way to go from here?

Thanks

Posted: Wed Dec 28, 2005 7:36 pm
by Chris Corbyn

Code: Select all

CREATE TABLE `products` (
    `id` int,
    `cat_id` int,
    `name` varchar(30),
    `description` text
)
:?:

Posted: Wed Dec 28, 2005 7:45 pm
by Jim_Bo
Yer im fine with that part ..

Im thinking more along the lines of the cms side of it ..

How I go about getting the maincat, subcat and products added.

Do I some how use an all in one form .. or a seperate for for each input?


Cant think it up at the moment :(

Cheers

Posted: Thu Dec 29, 2005 7:03 am
by BDKR
You need to put all of them in ONE table. No need for multiple tables.

Posted: Thu Dec 29, 2005 11:39 am
by josh
actually bdkr, I would like to argue you point in saying that if you have a seperate category table and map products to their category using a foreign key, your site will benefit in that you do not have to cache the distinct categories to make a list of all the categories for your menu. It also saves storage space on larger tables

Posted: Thu Dec 29, 2005 12:58 pm
by BDKR
jshpro2 wrote:actually bdkr, I would like to argue you point in saying that if you have a seperate category table and map products to their category using a foreign key, your site will benefit in that you do not have to cache the distinct categories to make a list of all the categories for your menu. It also saves storage space on larger tables
Actually, I never made a point. :D

What I did do was begin to hint at a solution. If you have a solution he can use, I'm all ears.

But I will say that using seperate tables limits the depth of your sub-categories. So for each sub-category level we add a new table?

How I've overcome this issue is by putting all of the categories, their sub-categories, and their sub-categories, ad-infinitum, into one table. Not only do they each get an ID, but they each have a field called parent_id. Beyond that, building the data structure that will represent the menu and all of it's subcategories is simply a matter of recursion.

If you mention the overhead of recursion, I'll prolly agree that you don't want to incur it all the time. The solution? Serialize the data structure mentioned above. Therefore, to build the menu, all I have to do is unserialize the stored data structure and I'm off to the races. And it's definitely quicker then munging through the overhead of db connection, network latency, the hope that the db engine did a good job of optimizing the query before it runs it, and then finally getting all that back. This way, the only time I venture into the database is if I've changed, added, or removed a category, which in turn will also trigger removal of the serialized data structure.

As for saving space, that serialized data structure is only 2.5k.

Cheers,
BDKR

Re: Store data + advice

Posted: Thu Dec 29, 2005 5:10 pm
by Jeroen Oosterlaar
Jim_Bo wrote:Hi,

A few questions on the best way to go ..

I have a menu system structured like so which is populated fro 2 tables in the database:

T-Shirts
- ACDC
- Eminem
- D12
Pants
Jackets

t-Shirts is the selected menu in this case ..

Now to add products to these subcat's I assume I need to introduce 3rd table .. so far I have "maincat", "subcat" and I guess I add the third "products"

Whats the best way to go from here?

Thanks
Just like have said: implement three tables. For example:

Code: Select all

MainCategory { _mctId_, mctName, ... };

SubCategory { _sctId_, sctMainCategoryId_, sctName, ... };

Product { _prdId_, prdSubCategoryId_, prdName, ... }

Posted: Thu Dec 29, 2005 5:40 pm
by josh
Well by space I mean if your category name is 10 characters long and you have 1,000 rows in it, your table will take up more space then if you are only storing a foreign key to the "category table", and in this way your category table has

id | name | parent_id
--------------------------

and your "products table" has:

id | name | category | [many other fields] ....
--------------------------------------------------------

I do agree the data should be cached somewhere (but I usually don't for small datasets), but even creating that cache alone can take quite some time... do you want it to take 15 seconds to add a new category all the time? This way instead of selecting distinct categories from the product table to build that cache, your are selecting all the names from the category table where parent_id = 0 (root nodes), then for each subcategory you just look at the parent_id.


This would be my advise / solution to the problem

to retrieve products under a certain category (assuming you have the category id) you just send an inner join on the tables


EDIT
You need to put all of them in ONE table. No need for multiple tables.
so we were both debating for the same method, I originally thought you meant just have one products table

id | name | category

:oops:

I was like reading your post "your method limits the sub-category level" i was like wtf no it doesn't!


so yeah, I agree with that then. definitely keep the category table SEPERATE from the products table!

Re: Store data + advice

Posted: Thu Dec 29, 2005 7:14 pm
by BDKR
Jeroen Oosterlaar wrote:
Jim_Bo wrote:Hi,

A few questions on the best way to go ..

I have a menu system structured like so which is populated fro 2 tables in the database:

T-Shirts
- ACDC
- Eminem
- D12
Pants
Jackets

t-Shirts is the selected menu in this case ..

Now to add products to these subcat's I assume I need to introduce 3rd table .. so far I have "maincat", "subcat" and I guess I add the third "products"

Whats the best way to go from here?

Thanks
Just like have said: implement three tables. For example:

Code: Select all

MainCategory { _mctId_, mctName, ... };

SubCategory { _sctId_, sctMainCategoryId_, sctName, ... };

Product { _prdId_, prdSubCategoryId_, prdName, ... }
So what does he do when he (or a boss?) decides they also want shirts with the names of the albums by those bands? Add another table?

So what does he do when he (or a boss?) decides they also want shirts with the names of the songs on those bands? Add another table?

See a pattern here? Now yes, the solution will work, but it's inflexible. Sure, you can keep adding tables as needed, but it's inelegant, inflexible (as mentioned before), and absolutely brain dead if you're constantly changing the menu based on sales, specials, or seasonal sales. Ask me how I know. :wink: LOL

Cheers,
BDKR

Posted: Thu Dec 29, 2005 7:31 pm
by BDKR
Well, it does seem we are very close to doing things in a similar fashion. However, I'm a bit puzzled by the below quote.
jshpro2 wrote: I do agree the data should be cached somewhere (but I usually don't for small datasets), but even creating that cache alone can take quite some time... do you want it to take 15 seconds to add a new category all the time?
I'm sure that "15 seconds" is an exagerration. Here are some questions:

1) how long can it take to write a file with 2.5k of information on it?
2) How long does it take to generate the information to be written to that file?
3) How long does it take to retreive the information from that file?
4) Is retrieval of the information on that file faster then getting it from the database at EVERY PAGE REQUEST?
jshpro2 wrote: This way instead of selecting distinct categories from the product table to build that cache, your are selecting all the names from the category table where parent_id = 0 (root nodes), then for each subcategory you just look at the parent_id.
Something like that. I actually select all of the categories out of the table and put them in an array. From there I start a re-iterative (or recursive or multi-pass or whatever you want to call it) process of building the menu structure (in an array) starting with those with a parent_id of 0. This way I only make one call to the database for this information then re-arrange it in memory.

Is that how you do it. I wasn't sure from what you said. :oops:

Posted: Thu Dec 29, 2005 9:11 pm
by blacksnday
this isnt so much related to the original post
but the problem is similar in regards to how to store data
wether new tables or same table.

I had same problem when i creating a rating system
and needed to track the results from 4 different types of ratings
with one type needing two sub-types:
article rating
quick article rating
expic rating
yourpic rating

So I made one table to handle all ratings and correctly id each of them

Now the expic and yourpic ratings are actually coming from same
photo article, while the article and quick article are both seperate.

My table setup finally became:
rid - main rating record id
bashid - id of the article
rateid - id of the photo article
quickbashid - id of the quick article
type - blank if article or quick article. then either expic or yourpic
ip - addy of voter
vote - vote of voter
ts - new total votes for selection
nv - new number of votes for selection


Now the two ways I can properly select the correct selection
for any type of vote is by the idof the selection and the type

The actual total vote and number of vote tallies are stored in the proper
selection's table, but the above allows me to tie all together at once in
one table when needed.

Just thought I would throw this idea in because it shows
how you can effectivly use one table to contain identifiers for many
of the same type.

Posted: Thu Dec 29, 2005 10:30 pm
by josh
BDKR, the 15 second estimation was under the pretense of not having category names in a separate table. I would say 15 seconds is fairly accurate on a 200MB+ table and a good number of categories, even longer then 15 seconds if the application is under a lot of load.

For applications of this magnitude I usually have a cache table

Code: Select all

CREATE TABLE `cache` (
`id` INT( 10 ) NOT NULL AUTO_INCREMENT ,
`name` VARCHAR( 35 ) NOT NULL ,
`data` BLOB NOT NULL ,
`time` INT( 50 ) NOT NULL ,
PRIMARY KEY ( `id` ) ,
INDEX ( `time` )
)
something like that, and I do basically what you said, rather then storing a serialized array I store the raw html output ( unserialize() is verrrry sloow )

I find the conveineince of having all my cached data in one easy to access table outweighs any performance issues (until you get into larger amounts of data being cached).

Re: Store data + advice

Posted: Fri Dec 30, 2005 3:44 am
by Jeroen Oosterlaar
BDKR wrote:So what does he do when he (or a boss?) decides they also want shirts with the names of the songs on those bands? Add another table?

See a pattern here? Now yes, the solution will work, but it's inflexible. Sure, you can keep adding tables as needed, but it's inelegant, inflexible (as mentioned before), and absolutely brain dead if you're constantly changing the menu based on sales, specials, or seasonal sales. Ask me how I know. :wink: LOL

Cheers,
BDKR
Inflexible??? One of the major conventions in database design is that you separate tables based on their functional dependencies. Of course you will have to add another table in the case you are describing here. You have to use as many tables as needed in order to exclude (or at least minimize) redundancy.

Re: Store data + advice

Posted: Fri Dec 30, 2005 8:17 am
by BDKR
Jeroen Oosterlaar wrote: Inflexible??? One of the major conventions in database design is that you separate tables based on their functional dependencies.
Of course. I suspect nothing I said detracts from that fact.
Jeroen Oosterlaar wrote: Of course you will have to add another table in the case you are describing here. You have to use as many tables as needed in order to exclude (or at least minimize) redundancy.
Where is the redundancy coming from? You can use one table to store a heirachal structure of infinite depth without redundancy. If there is something I'm missing here, please explain.

Posted: Fri Dec 30, 2005 8:24 am
by BDKR
jshpro2 wrote: BDKR, the 15 second estimation was under the pretense of not having category names in a separate table. I would say 15 seconds is fairly accurate on a 200MB+ table and a good number of categories, even longer then 15 seconds if the application is under a lot of load.
Wow! Is this 200mb for just your categories?
jshpro2 wrote: For applications of this magnitude I usually have a cache table
...
something like that, and I do basically what you said, rather then storing a serialized array I store the raw html output ( unserialize() is verrrry sloow )
I agree that serialize is slow. It works for us in this case, but if the size of that data grew, I would go in a different direction. Something that I've messed around with is the Turck MMCache API for saving data in memory.

Cheers