Multi-dimensional table
Moderator: General Moderators
Multi-dimensional table
Hey, let's say I had a table with the following fields: id, parent_id, content_id. The id field is a FOREIGN KEY for another table and the parent_id would be the FOREIGN KEY for id. In other words, parent_id is referencing to that same table it is in.
I know that you can JOIN tables to itself. But what I'm wondering, for a table like this, how do you return a result set of all the rows under a specific id with the fields count and level, and the result set's count column would be the number of rows under another row, and the level column would be the level from the specific id? This way I can select all the rows under a specific id and all the rows under those rows and all the rows under those, etc.
Is there a way to achieve this kind of thing? Would I have to define the table a bit differently to do this? Or is there a query that can do this with the table I defined above?
Thanks for reading. I might of made it clear on what I'm trying to do, but if I didn't, please let me know.
I know that you can JOIN tables to itself. But what I'm wondering, for a table like this, how do you return a result set of all the rows under a specific id with the fields count and level, and the result set's count column would be the number of rows under another row, and the level column would be the level from the specific id? This way I can select all the rows under a specific id and all the rows under those rows and all the rows under those, etc.
Is there a way to achieve this kind of thing? Would I have to define the table a bit differently to do this? Or is there a query that can do this with the table I defined above?
Thanks for reading. I might of made it clear on what I'm trying to do, but if I didn't, please let me know.
Re: Multi-dimensional table
Let's say I had this table:
My table
| id | parent_id |
| 1324.. | 12124... |
where id is the primary key of the row and parent_id is the parent's id for a row, if there is one. So in other words parent_id references to the same table it's in.
How do I return a result set like:
| level | count |
| 1 | 5 |
| 2 | 10 |
| 3 | 100 |
...
where level is the number of levels from specific id? What I mean by specific id is an id concatenated in a SQL statement. For example, I'd like to be able to write a query that counts the rows "under" the id of 23. For example let's say there are 3 rows that have the parent_id of 23, and there are numerous rows that have the parent_id of these rows' ids. I'll express this multi-level relationship with JSON:
As you can see each row can have "children", or rows that have a parent_id of the row's id. Every level is a step down this chain. If I wanted to count the number of rows for each level I would get something like:
| level | count |
| 1 | 3 |
| 2 | 5 |
| 3 | 1 |
The reason I get these figures is because on the first level I have only three child rows of 23: 3, 4, and 5. On the second level I have 5 rows: 55, 34, 21, 2, and 6. And only 1 row for level three: 11. This basically what I'd like to get out of my table.
How do I write such a query that will give my these results?
I appreciate any help on this. Thanks.
My table
| id | parent_id |
| 1324.. | 12124... |
where id is the primary key of the row and parent_id is the parent's id for a row, if there is one. So in other words parent_id references to the same table it's in.
How do I return a result set like:
| level | count |
| 1 | 5 |
| 2 | 10 |
| 3 | 100 |
...
where level is the number of levels from specific id? What I mean by specific id is an id concatenated in a SQL statement. For example, I'd like to be able to write a query that counts the rows "under" the id of 23. For example let's say there are 3 rows that have the parent_id of 23, and there are numerous rows that have the parent_id of these rows' ids. I'll express this multi-level relationship with JSON:
Code: Select all
{
23: {
3: {
55, 34, 21
},
4: {
2: {
11
}
},
5:{
6
}
}| level | count |
| 1 | 3 |
| 2 | 5 |
| 3 | 1 |
The reason I get these figures is because on the first level I have only three child rows of 23: 3, 4, and 5. On the second level I have 5 rows: 55, 34, 21, 2, and 6. And only 1 row for level three: 11. This basically what I'd like to get out of my table.
How do I write such a query that will give my these results?
I appreciate any help on this. Thanks.
Re: Multi-dimensional table
The problem of storing hierarchical data in a relationship database is a common one. There are several approaches, and you implemented what is called an adjacency list. It is very good for retrieving the entire tree or the immediate parents / children but difficult for querying a part of the tree.
This article on the MySQL site gives a nice overview of a this approach and another common one called "nested sets". There is a third option called closure tables, and you can find a good comparison of the three approaches in this presentation by Bill Karwin (trees start at page 48).
This article on the MySQL site gives a nice overview of a this approach and another common one called "nested sets". There is a third option called closure tables, and you can find a good comparison of the three approaches in this presentation by Bill Karwin (trees start at page 48).
Re: Multi-dimensional table
Thanks for your post, that was a lot of good info. But unfortunately, I'm still not so sure on how to accomplish what I need at the most simple level. I know I like the adjacent list model for it's simplicity. The nested set model makes inserting new rows a bit more complicated, and sense my table is going to have a lot of inserting happening, it's probably not the best for my needs. If there was a way to find the depth of nodes for a adjacent list model, like what is said on the mysql.com article, then that would be exactly what I need.pytrin wrote:The problem of storing hierarchical data in a relationship database is a common one. There are several approaches, and you implemented what is called an adjacency list. It is very good for retrieving the entire tree or the immediate parents / children but difficult for querying a part of the tree.
This article on the MySQL site gives a nice overview of a this approach and another common one called "nested sets". There is a third option called closure tables, and you can find a good comparison of the three approaches in this presentation by Bill Karwin (trees start at page 48).
Re: Multi-dimensional table
You really should have a look at that presentation, there are two approaches there that could be useful to you - a modified adjacency list (with storing the path) and closure tables.
Re: Multi-dimensional table
I looked at the presentation. I'm interested the closure table model. The only questions I have with this model is how to I insert new rows? Say I had a table with some rows like:
If I wanted to insert a few row that have the ancestor_id of 3, then I would have to insert a few other rows with the ancestor_id of 2 and 1. But how would I know to insert these other two rows? What if 3 wasn't a descendant of 2 and 1? What if 2 wasn't a descendant of 1?
It seems like every model other then the adjacency list model makes it harder to insert new rows.
[EDIT] Come to think of it, I don't really need a full hierarchal table model. I just need a way to count the number of rows per level.
I need a way to go through every level and count how many children are in each level. To make this more clear, let me illustrate with a family tree:
Let's say I wanted to be able to write a query to know how many children there are per generation for Bob.
First level would be 2 (Marry and Dan). Second level would be 4 (Bill, Tom, Moe, and Kim). And third level would be 1 (Joe).
So all I really would like to know is which table model would be the easiest one to accomplish this and how would I accomplish it with the model(What's the query)?
[EDIT] I want to go with the closure model, but I don't know how to insert new nodes(why I say nodes instead of rows is because a node is made up of multiple rows in the closure model).
Code: Select all
+-------------+---------------+---------+
| ancestor_id | descendant_id | post_id |
+-------------+---------------+---------+
| 1 | 2 | 1 |
+-------------+---------------+---------+
| 1 | 3 | 1 |
+-------------+---------------+---------+
| 2 | 3 | 1 |
+-------------+---------------+---------+
It seems like every model other then the adjacency list model makes it harder to insert new rows.
[EDIT] Come to think of it, I don't really need a full hierarchal table model. I just need a way to count the number of rows per level.
Code: Select all
+-------+-------+
| count | level |
+-------+-------+
| 2 | 1 |
+-------+-------+
| 5 | 2 |
+-------+-------+
...
Code: Select all
Bob
Marry
Bill
Dan
Tom
Moe
Joe
Kim
Sydney
Ace
Dax
Kile
Fran
Terminator
Code: Select all
+-------+-------+
| count | level |
+-------+-------+
| 2 | 1 |
+-------+-------+
| 4 | 2 |
+-------+-------+
| 1 | 3 |
+-------+-------+
So all I really would like to know is which table model would be the easiest one to accomplish this and how would I accomplish it with the model(What's the query)?
[EDIT] I want to go with the closure model, but I don't know how to insert new nodes(why I say nodes instead of rows is because a node is made up of multiple rows in the closure model).
Re: Multi-dimensional table
Bill gives an example of how to add nodes to the closure table:
The main row insertion in the regular table (for example, a comment with an id = 8 which is a child of a comment with an id = 5)
Creating an ancestor node for that row in the closure table:
Add the nodes to make that comment a decendant of all the ancestors of comment #5 (its parent):
The main row insertion in the regular table (for example, a comment with an id = 8 which is a child of a comment with an id = 5)
Code: Select all
INSERT INTO comments ...Code: Select all
INSERT INTO TreePaths (ancestor, descendant)
VALUES (8, 8);Code: Select all
INSERT INTO TreePaths (ancestor, descendant)
SELECT ancestor, 8 FROM TreePaths
WHERE descendant = 5;Re: Multi-dimensional table
I saw this, but I'm not sure on how it works. In the first statement I see a "...", what's that? And in the third statement I'm not sure what the row 8 is in "SELECT ancestor, 8". Also in the third statement, what does the subquery return? I don't understand how you get the number 5 in the third statement "WHERE descendant = 5".pytrin wrote:Bill gives an example of how to add nodes to the closure table:
The main row insertion in the regular table (for example, a comment with an id = 8 which is a child of a comment with an id = 5)Creating an ancestor node for that row in the closure table:Code: Select all
INSERT INTO comments ...Add the nodes to make that comment a decendant of all the ancestors of comment #5 (its parent):Code: Select all
INSERT INTO TreePaths (ancestor, descendant) VALUES (8, 8);Code: Select all
INSERT INTO TreePaths (ancestor, descendant) SELECT ancestor, 8 FROM TreePaths WHERE descendant = 5;
I'm just getting MySQL down, so this kind of thing is pretty advanced to me (at least for now). Bill didn't really exaggerate on these queries, maybe because his presentation wasn't for newbies.
Please help me to understand.
[EDIT]According to my research, there is a INSERT...SELECT syntax. I guess what the third query is doing is inserting multiple rows based on what is selected. But I still don't know what 8 means as a column name.
Re: Multi-dimensional table
The first statement relates to your original table to which you want to add heirarchy (the closure table is a separate table). In the example it was a comment table, and there is no need to elaborate on additional fields which are not relevant to the subject (fields like user_id, content etc that are part of the comment), so Bill left it out with '...'. Use what's relevant for your case.
Regarding the closure table statements - in the example we are adding a comment which has an id of 8. To mark it in the closure table we create a row with both ancestor and decendant having and id of 8 (since it belongs to it's own subtree).
That comment is a child of a comment with an id of 5. So we need to indicate that in the closure table, as well as indicate that it's a child of all comment #5 ancestors as well. So we select all the nodes where the comment with an id of 5 is a decendant of (that includes its own node, which has both the ancestor_id and descendant_id as 5). We add the returned rows as an ancestors with a decendant_id of 8 - thus marking all of those as ancestors of the row with an id 8.
It's a little complicated to grasp at first, I agree. You might consider path enumeration as an alternative.
Regarding the closure table statements - in the example we are adding a comment which has an id of 8. To mark it in the closure table we create a row with both ancestor and decendant having and id of 8 (since it belongs to it's own subtree).
That comment is a child of a comment with an id of 5. So we need to indicate that in the closure table, as well as indicate that it's a child of all comment #5 ancestors as well. So we select all the nodes where the comment with an id of 5 is a decendant of (that includes its own node, which has both the ancestor_id and descendant_id as 5). We add the returned rows as an ancestors with a decendant_id of 8 - thus marking all of those as ancestors of the row with an id 8.
It's a little complicated to grasp at first, I agree. You might consider path enumeration as an alternative.
Re: Multi-dimensional table
Oh I see. By placing a value as a column name in a SELECT statement simply makes a column with that value. I guess this is how a function like COUNT() can be in place of a column name. I assume that for every iteration that mysql does (not that I'm sure on how mysql does it) mysql evaluates the functions/values. This clears some things up.
I think I'm getting the rest of it now. You need to insert a new row initially before you can insert the rest of the rows for the high levels of a node(ancestors). Could you combine the second and third statement into one somehow? This way a node can be added with only one statement (not including the first statement for the other table (comments)). Just curious if this is possible.
Also I'm wondering if it could work if I modified the model a bit. Could there be a way for it to work without a node being a descendant of itself? That is:
would become
You might think this isn't a good idea, but why not? Maybe it's not as flexible, but then how so? What are some of the reasons for doing it this way? One, less rows. Two, no need for the second statement maybe. But what would I lose if I modify the model this way?
I think I'm getting the rest of it now. You need to insert a new row initially before you can insert the rest of the rows for the high levels of a node(ancestors). Could you combine the second and third statement into one somehow? This way a node can be added with only one statement (not including the first statement for the other table (comments)). Just curious if this is possible.
Also I'm wondering if it could work if I modified the model a bit. Could there be a way for it to work without a node being a descendant of itself? That is:
Code: Select all
+-------+--------+
| ans | desc |
+-------+--------+
| 1 | 1 |
+-------+--------+
| 1 | 2 |
+-------+--------+
| 2 | 2 |
+-------+--------+
| 1 | 3 |
+-------+--------+
| 2 | 3 |
+-------+--------+
| 3 | 3 |
+-------+--------+
Code: Select all
+-------+--------+
| ans | desc |
+-------+--------+
| 1 | 2 |
+-------+--------+
| 1 | 3 |
+-------+--------+
| 2 | 3 |
+-------+--------+
Re: Multi-dimensional table
It's needed for the second insert statement - notice how it selects all the rows that have decendant_id = 5. If the row (5,5) didn't exist, the relation with that row would have to inserted separately (5,8), so you would be replacing one insert query with another. The main difference is that with the first approach the parent row is included in the subtree and in the second it is not. Depending on whether you need this feature or not you might be able to do as you suggested.
Re: Multi-dimensional table
That's true. The reason why I propose this is because I don't need this feature. My model could work just the same only you reverse the initial statement statements. In the first statement rather then making a row with the same ancestor you make it with the parent ancestor (the one directly above it). In fact this is a compact way of doing the job:pytrin wrote:It's needed for the second insert statement - notice how it selects all the rows that have decendant_id = 5. If the row (5,5) didn't exist, the relation with that row would have to inserted separately (5,8), so you would be replacing one insert query with another. The main difference is that with the first approach the parent row is included in the subtree and in the second it is not. Depending on whether you need this feature or not you might be able to do as you suggested.
Code: Select all
INSERT post_referrals
SELECT 29, 3, 5 FROM post_referrals
UNION
SELECT 29, ancestor_id, 5 FROM post_referrals
WHERE descendant_id = 3;The only thing that concerns me though is about select queries. Is there any benefit for having those extra rows when SELECTing? Anything you can think of? I'm sure that if I run into something I could always add the extra rows.
Re: Multi-dimensional table
Well, for common trees I've implemented (threaded comments, menu navigation, folder system) it was often useful to have the parent data when selecting a subtree. You usually display the parent and then branch off the children. That might not be a need in your case.
Re: Multi-dimensional table
I'm not quite so sure on what you mean. What do these rows say, other then the row is the parent of itself (I'm my own grandpa kind of thingpytrin wrote:Well, for common trees I've implemented (threaded comments, menu navigation, folder system) it was often useful to have the parent data when selecting a subtree. You usually display the parent and then branch off the children. That might not be a need in your case.
Re: Multi-dimensional table
Those rows point to the actual rows in the linked table (a comments table for example). Hence, I can use one query to fetch all the information I need to render a subtree. For example, the subtree of comment with an id of 24 would be fetched with:
Code: Select all
SELECT * FROM comments
INNER JOIN comments_tree ON ancestor_id=24