share advice on data/implementation design
Moderator: General Moderators
share advice on data/implementation design
Hi,
The project i'm working on deals with report forms.
There expected to be as much as 10000 instances of such forms.
On one hand, each report form has definite hierarchical structure that can be naturally implemented as xml document. It is not practical to store it in database for at least 2 reasons:
1. the hierarchy of this form is quite complex and using normalized approach is quite a mess that will require 7+ extra tables.
2. the hierarchy elements do not need to be accessed in any special way like sorting.
On the other hand if i store each report as xml there will be huge space overhead...(although i could improve that by compressing xml data)
What do you about this space overhead issue or the decision to use xml format??
The project i'm working on deals with report forms.
There expected to be as much as 10000 instances of such forms.
On one hand, each report form has definite hierarchical structure that can be naturally implemented as xml document. It is not practical to store it in database for at least 2 reasons:
1. the hierarchy of this form is quite complex and using normalized approach is quite a mess that will require 7+ extra tables.
2. the hierarchy elements do not need to be accessed in any special way like sorting.
On the other hand if i store each report as xml there will be huge space overhead...(although i could improve that by compressing xml data)
What do you about this space overhead issue or the decision to use xml format??
Re: share advice on data/implementation design
As you say: the model is a hierarchy. And most people struggle implementing a hierarchical model/tree with SQL.newmember wrote: 1. the hierarchy of this form is quite complex and using normalized approach is quite a mess that will require 7+ extra tables.
But there are a couple usable solutions: Adjacency List, Materialized path, Nested Sets, ..
Imho these are the tasks of a database:newmember wrote: 2. the hierarchy elements do not need to be accessed in any special way like sorting.
- make data accessible
- secure access to data
- avoid redundancy
- guard consistency of data
The fact that you are going to manipulate them or not, doesn't really matter imho.
An XML file is a hierarchical model by design. So it can be convenient to store your hierarchical data in such a model. I don't think that XML's verboseness should be a reason to (not) use it, certainly if you consider that datastorage is extremely cheep these days...newmember wrote: On the other hand if i store each report as xml there will be huge space overhead...(although i could improve that by compressing xml data)
Issues with an XML file:
- no support for concurrent access
- no real guards to maintain data in a consistent state (dtd and xsd can help here)
- less performant than a standard rdbms
Basically, i would look for a solution that can store a tree/xml document in a SQL database. This way you have best of two worlds.
i'll start quoting you back
thanks for the link... very interesting...
particularly the part about nested sets...once i was thinking about using number ranges,i don't recall now what for, but i couldn't figure out the mathematics
two cases:
1. entire xml tree(report) is stored in database
2. storing report form using nested sets approach.
consider the operation of displaying the report form.
in case 1: i retrive entire xml blob from db and using dom api or xsl api to generate html.
in case 2: i need to perfom certain number of non trivial queries to get all the fields (the hierarchy has at least 5 levels of nesting and these levels are all different).
do you believe that database will outperfom the dom api/xsl?
(imho in these particular cases dom will be faster)
yes, i was thinking about this too...Basically, i would look for a solution that can store a tree/xml document in a SQL database
thanks for the link... very interesting...
particularly the part about nested sets...once i was thinking about using number ranges,i don't recall now what for, but i couldn't figure out the mathematics
not less important is the ability to find the data fast...that's why i mentioned 'sorting' earlier...Imho these are the tasks of a database:
- make data accessible
- secure access to data
- avoid redundancy
- guard consistency of data
The fact that you are going to manipulate them or not, doesn't really matter imho.
as you suggested, if i store xml data in database then no problem with that...- no support for concurrent access
- less performant than a standard rdbms
two cases:
1. entire xml tree(report) is stored in database
2. storing report form using nested sets approach.
consider the operation of displaying the report form.
in case 1: i retrive entire xml blob from db and using dom api or xsl api to generate html.
in case 2: i need to perfom certain number of non trivial queries to get all the fields (the hierarchy has at least 5 levels of nesting and these levels are all different).
do you believe that database will outperfom the dom api/xsl?
(imho in these particular cases dom will be faster)
I wouldn't consider that a good idea.. In that case you are simply storing a file.. And files are best stored in a filesystem...newmember wrote: two cases:
1. entire xml tree(report) is stored in database
This is the approach i like most...newmember wrote: 2. storing report form using nested sets approach.
In my experience performing XSL is a CPU hungry process...newmember wrote: in case 1: i retrive entire xml blob from db and using dom api or xsl api to generate html.
With the Nested Sets, or any other Tree "solution" for a SQL product i think you can reduce that number of queries... Might want to have a look at http://pear.php.net/package/DB_NestedSetnewmember wrote: in case 2: i need to perfom certain number of non trivial queries to get all the fields (the hierarchy has at least 5 levels of nesting and these levels are all different).
the host doesn't have DB_NestedSet package installed and doesn't want to install it either
and i don't have the time to write my own implementation...
so what another solution would you use?
and i don't have the time to write my own implementation...
so what another solution would you use?
can you please explain why? (because of waste of space?)newmember wrote:
two cases:
1. entire xml tree(report) is stored in database
I wouldn't consider that a good idea..
Consider these 3 points:
1.
as i already mentioned earlier, is that in my case levels in hierarchy are all completely different from each other...
it not the same as maintaining hierarchy in single table...
various algorithems for maintaing hierarchy, that i saw, are dealing with hierarchy of identical elements(like menu entries for example).
so if i want to use them for structuring reports storage i probably would need additional tables for cross-linking...
2.
another point that comes to mind is that if i use nested intervals kind of approach then it will be very hard later to expand the report format because it is actually is hardcoded into cross-linked table structure...
3.
(ofcourse the user's browser must on of the recent ones...but it is not really much of a problem)
Do you still think that
thanks, very appriciate your answers
1.
as i already mentioned earlier, is that in my case levels in hierarchy are all completely different from each other...
it not the same as maintaining hierarchy in single table...
various algorithems for maintaing hierarchy, that i saw, are dealing with hierarchy of identical elements(like menu entries for example).
so if i want to use them for structuring reports storage i probably would need additional tables for cross-linking...
2.
another point that comes to mind is that if i use nested intervals kind of approach then it will be very hard later to expand the report format because it is actually is hardcoded into cross-linked table structure...
3.
on the contrary,in many cases, i can ease on the server by using the visitor's machine for doing xsl transformation.In my experience performing XSL is a CPU hungry process...
(ofcourse the user's browser must on of the recent ones...but it is not really much of a problem)
Do you still think that
is suitable one for my needs?Nested Sets, or any other Tree "solution"
thanks, very appriciate your answers
You can have a custom/local install of pear packages..newmember wrote:the host doesn't have DB_NestedSet package installed and doesn't want to install it either![]()
Because in that case your xml would only be a bit of blob.. binary data.. And a dbms is usually not specialized for this.. A filesystem on the other hand is..newmember wrote: 1. entire xml tree(report) is stored in database
Imho, a XML document is a hierarchical structure that nodes and/or attributes.newmember wrote: as i already mentioned earlier, is that in my case levels in hierarchy are all completely different from each other...
it not the same as maintaining hierarchy in single table...
various algorithems for maintaing hierarchy, that i saw, are dealing with hierarchy of identical elements(like menu entries for example).
so if i want to use them for structuring reports storage i probably would need additional tables for cross-linking...
So basically, i only see 2 tables at most (nodes and attributes)
I've got the feeling that you might want to rethink your complete "report/data" solution. Because i don't think it's very wise to store a specific format/report of your data anyway. All you need is the actual data, and then retrieve it from your database and decorate it with any markup/xml you want.newmember wrote: another point that comes to mind is that if i use nested intervals kind of approach then it will be very hard later to expand the report format because it is actually is hardcoded into cross-linked table structure...
Well, xml format itself is meant for describing data....so i wouldn't call it just "decorate"...All you need is the actual data, and then retrieve it from your database and decorate it with any markup/xml you want
"decoration" is when you transform your xml data/file into some other target-specific format...
Imho, XML does not describe data. And it never will..newmember wrote: Well, xml format itself is meant for describing data....
from http://www.w3.org/XML/
Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.
bear with me just a little longer
i still thinking about the right approach...and
the thing is that i have the feeling that for my particular needs storing such a report as xml blob is the right solution...
i can't write here the exact hierarchy of report form simply because i'm still don't have all the details.
they showed me some example reports and i basicly looked at the structure...
so imagine heirarchy something like here:

if i'm going for "nested sets" kind of solution then there are two important thing about this hierarchy:
1. each level describes different data so there must be a separate table for each level (i don't see any other way)
2. all this data will never be manupilated (from users point of view) in any way exept viewing it...
now as i mentioned the expected number of such report forms is of 10^4 order.
calculate now how many rows will such a report require... two last levels are expected to have very high branching factor...
so in the end we going to have millions if not more of rows.
add to that all the huge extra space taken by cross-referencing all this data...
and finally recall that all these millions of rows by itself have no information value...
compare all this mess with 10000+ records of xml blobs...
one query to extract the blob and ....a little parsing for display...
tell me what you think about it...
thanks
i still thinking about the right approach...and
the thing is that i have the feeling that for my particular needs storing such a report as xml blob is the right solution...
i can't write here the exact hierarchy of report form simply because i'm still don't have all the details.
they showed me some example reports and i basicly looked at the structure...
so imagine heirarchy something like here:

if i'm going for "nested sets" kind of solution then there are two important thing about this hierarchy:
1. each level describes different data so there must be a separate table for each level (i don't see any other way)
2. all this data will never be manupilated (from users point of view) in any way exept viewing it...
now as i mentioned the expected number of such report forms is of 10^4 order.
calculate now how many rows will such a report require... two last levels are expected to have very high branching factor...
so in the end we going to have millions if not more of rows.
add to that all the huge extra space taken by cross-referencing all this data...
and finally recall that all these millions of rows by itself have no information value...
compare all this mess with 10000+ records of xml blobs...
one query to extract the blob and ....a little parsing for display...
tell me what you think about it...
thanks
Imho, filesystems are optimized for storing files. So why do you think that storing a file in a sql database is better? Where's the added value?newmember wrote:and
the thing is that i have the feeling that for my particular needs storing such a report as xml blob is the right solution...
I've got the feeling you don't understand the model. It's not about the actual data is contains, it's about the relation betweeen the data (tree). Just as in an xml document you have a root node with childnodes.newmember wrote: if i'm going for "nested sets" kind of solution then there are two important thing about this hierarchy:
1. each level describes different data so there must be a separate table for each level (i don't see any other way)
In that case, simply storing your xml file in a filesystem seems most appropriate.newmember wrote: 2. all this data will never be manupilated (from users point of view) in any way exept viewing it...
But i still got the feeling the data analysis smells..
then i don't need to track filenames and paths...i mean everything is put in one place...So why do you think that storing a file in a sql database is better?
i might not understand...I've got the feeling you don't understand the model. It's not about the actual data is contains, it's about the relation betweeen the data (tree).
how would you model the report data hierarchy? how many tables would you use and how?
You still have to track the primary key. Just as the absolute file path is the primary key in a filesystem.newmember wrote: then i don't need to track filenames and paths...i mean everything is put in one place...
And you can store all files in the same directory, so they are all in one place too...
It all depends on the actual data (and what the data means) and the relationships between the elements of data.I've got the feeling you don't understand the model. It's not about the actual data is contains, it's about the relation betweeen the data (tree).
Probably the "report" itself is just a product/process of other data sources.. Fe: You know that the age of a person is 15. But you know that age equals difference between now and birthdate years. Which one do you consider most flexible?
On the other hand if the reports aren't going to change anymore, and you don't need to perform searches in the report data storing them in a static file isn't a bad idea. This way you don't need to (re)calculate stuff that isn't going to change anyway.
earlier i didn't explain correctly why i wanted to use xml blobs instead filesystem...
it is because i need to track some data along with each report(for example order is important)...
so that there will be anyway record in db for each report and i thought that if so then why not store xml file itself in db as blob.
but after some thinking i guess that filesystem is more appropriate...
ok thanks for all your help,
now it is the time to start coding all the stuff...
thanks again
it is because i need to track some data along with each report(for example order is important)...
so that there will be anyway record in db for each report and i thought that if so then why not store xml file itself in db as blob.
but after some thinking i guess that filesystem is more appropriate...
ok thanks for all your help,
now it is the time to start coding all the stuff...
thanks again