Page 1 of 2
schema design, linking
Posted: Sat Jul 28, 2007 5:36 pm
by nathanr
Looking to hear some opinions on this one:
SpecificTableA : SpecificTableB : SpecificTableC
linktable(s)->>
GenericTableA : GenericTableB : GenericTableC
Lets say an single entry(row) in GenericTableA could be linked to 5 rows in SpecificTableA, 3 in SpecificTableB and 4 in SpecificTableC
and likewise for for GenericTableB and C
Conversely a single entry(row) in SpecificTableA could be linked to 5 rows in GenericTableA, 3 in GenericTableB and 4 in GenericTableC
and likewise for for SpecificTableB and C
This obviously needs link tables to tie the data together, however the best format for these links tables would be..?
a:
a link table for every combination.. kind of weighty
linktable_specA_genA
linktable_specA_genB
linktable_specA_genC
linktable_specB_genA (and so on)
table cols: [link_id, spec_rowid, gen_rowid]
b:
a link table for every generic table:
linktable_genA
linktable_genB
linktable_genC
table cols: [link_id, spec_tableid, spec_rowid, gen_rowid]
c:
a single link table for everything
linktable
table cols: [link_id, spec_tableid, spec_rowid, gen_tableid, gen_rowid]
or (d)?? any more ideas?
keep in mind that there could be 10 specific tables, and 20+ generic tables, any table could get upwards of 2 million rows
ideas people?
Posted: Sat Jul 28, 2007 6:46 pm
by nathanr
additionally, this table structure may also start using duplicate table's merged, and then scale up to full NDB cluster, current table type is MyISAM
Posted: Sun Jul 29, 2007 4:56 am
by Ollie Saunders
whoa, that sounds really complex. Let's take a step back here. What are you trying to achieve? What functionality depends on a data structure such as this?
If the project is not going to get lots of traffic immediately, not in the first six months say, you could benefit from using XML as it lends itself to complex data structures much more easily than a databases. If you find it is too slow there are XML based databases you can use, you can store bits of XML in a normal database or you can generate XML from a database. These all function as intermediary steps for moving from XML to high performance DB.
Posted: Sun Jul 29, 2007 8:05 am
by nathanr
Cheers for the reply Ole!
I have thought the same thing about XML being of use in this position, and it will be for much of the front end, as a kind of cache'ing engine, however on closer inspection I found that the db structure would still be needed.
Traffic, its going to be through the roof, it's goign to be sitting on a dedicated quad core, 32gb, mysql server, and I have a feeelign that within 6 months it will be on a cluster of probably 8, and within 2 years we'd be looking at 50+ dedicated mysql cluster machines and abot 180 ui's.
The system is being developed with a core structure that covers the three primary parts of the system, I've already managed to get two of the three smacked out and tested, and they are holding up beautifully, one of them is a "virtual table" system that allows you to strap a full multi row virtual table onto any row of any fixed table, it also allows you to hold a table within a cell.. recursion checks in place and I'm most pleased with it, a fully normalised, optimised effienct 3d database in mysql *joy*
This the third part should have been the most simple, however this linking structure is really killing my head at the moment, sadly I can't give uot any details of the system due to NDA's and the prospect of a scary law suits however after some though I've figured this is the best way to think of the problem... (extremely simplified):
PersonTable - Business Table
[linkstable]
TelephoneNumberTable
1 person can have X phone numbers, similarly one business can have X phone numbers.
1 phone number can be saved against 1 business and 5 individuals, hwoever only a single entry in TelephoneNumberTable
thus in this case linkstable could be:
[linkid, personalORBusinessFlag, rowId, telnumId]
lets add in a couple more tables
PersonTable - Business Table - EstablishmentTable
[linkstable]
TelephoneNumberTable - EmailAddressTable
at this stage personalORBusinessFlag needs to indicate personalORBusinessPRestablishmentFlag | where flag is which table to use.. but we also need to link to email addresses.
thus do we add in a similar table for Email's or..??
ideas?
Posted: Sun Jul 29, 2007 9:32 am
by nathanr
feyd.. love to hear your opinions on this one if your around!
Posted: Sun Jul 29, 2007 9:38 am
by feyd
There's not enough information for me to come to a judgment.
Posted: Sun Jul 29, 2007 10:01 am
by nathanr
Okay we have two tables..
TableA
ta_id, ta_value, tb_id
TableB
tb_id, tb_value
to link 1 row in TableB to 1 Row in TableA we simply add in the TableB Row Id to TableA (using column tb_id) - nice and easy, nice and standard..
to link 5 rows in TableB to 1 Row in TableA we simply add in a "link" table & drop the tb_id table, in the following manner:
TableA
ta_id, ta_value
LinkTable
l_id, ta_id, tb_id
TableB
tb_id, tb_value
again, nice and simple
now suppose we have the following..
TableA
ta_id, ta_value
------------------------------
LinkTable_A2B
l_id, ta_id, tb_id
TableB
tb_id, tb_value
LinkTable_A2C
l_id, ta_id, tc_id
TableC
tc_id, tc_value
TableB and TableC have no relation to each other;
suddenly we need two link tables if we want to link 5 rows in TableB to 1 Row in TableA --[AND]-- link 5 rows in TableC to 1 Row in TableA
we also need to throw into the equation that the reverse is also true, we also want to link 5 rows in TableA to 1 Row in TableB (same for a-c)
if we can figure out the best manner to make a single link table in this situation, then I think we're half way to figuring out the full problem in posts1+2
Posted: Sun Jul 29, 2007 11:02 am
by Ollie Saunders
I'm not a database expert by any means, in fact it's my weakest area when it comes to web development. But I wonder if database normalisation is over-hyped. Would it not be the case that a simplier database structure would do that job with fewer complications. This is what I experienced
here. Here's
a post where people corrected me on what normalisation really means, these days I really don't know.
Posted: Sun Jul 29, 2007 11:26 am
by nathanr
I'm quite aware of normalisation, what we're talkign about here is taking 3rd r 4th normalisation a step further:
in the above example: MemberSoftware and MemberBook are the "link tables"
what I'm hoping to achieve is..
I'm looking for anybody who can suggest alternative methods.. so as to avoid the need for 4 tables (PersonAddress, BusinessAddress, PersonEmail, BusinessEmail)
Posted: Mon Jul 30, 2007 2:45 pm
by Begby
What will be the advantage of having 1 link table instead of 30 link tables? Do you have a solid reason for doing this besides "30 tables seems dumb"?
Posted: Mon Jul 30, 2007 2:59 pm
by nathanr
The scope of the application requires the left hand table to be referenced in the link table. I'm looking to work an alternative method of "variable table names" into the design, the only method of doing this af far as I'm aware is by using variables and a prepare stament that is then executed and deallocated, this is most certainly not an effienct approch, and to be honest is probably getting out of the scope of rdbms and into something a bit more enterprise level such as oracle.. I can even see postgres being capable of this.. possibly.. however not mysql.
What we're looking to do here is create an associative(link) table, that builds on the 4th norm to allow:
linkId | LeftTable | LeftRow | RightTable | RightRow
and obviously the accompanying SQL to actualy make it functional, prefereably without prepare, possibly with procedures, however something that would allow cache hits woukld obviously be ideal.
I'm sure 99% of mysql users could see a use for this

(its also worth noting that the 30 link tables would actually be nearer 300, which isn't the nicest job to php up, thus a series of links tables in the above format shows distinct advantages)
Posted: Mon Jul 30, 2007 3:28 pm
by onion2k
I think you're going to need a link table for every specific relationship otherwise you're going to run into difficulties JOINing them later. So long as you name them sensibly (eg do NOT try to make short codified names) you'll be fine.
Posted: Mon Jul 30, 2007 3:53 pm
by nathanr
Cutting it down then..
LeftLinkAddress
LinkId, LeftTable, LeftRow, AddressId
this is more than possible supposing we always know what the left table is going to be, all we need to do is join left_table on LeftRow WHERE LeftTable='LeftTableName' and obviousl join the address table on AddressId.
Reversing, like fora search, is a bit difficult though!
opinions on a stored proc which uses a prepare to.. (wanders of to check is a stored proc can use a prepare).. if it can though, stored proc uses prepare to create a view..
Posted: Tue Jul 31, 2007 10:11 am
by Begby
The other problem I see with this is foreign key constraints. One thing I like to do is have deletes cascade to join tables. If you have it so everything is in one join table, then you will either need to create triggers that will do the deletes for you or manually manage deleting all the related records from code.
Posted: Tue Jul 31, 2007 10:42 am
by ReverendDexter
I think you're overcomplicating things for the supposed sake of simplifying.
Basic rules of thumb:
1:many relationship - foriegn key
many:many relationship - intersection table
If you try to shortcut that for the sake of less tables, I think it'll just bite you in the ass later. The "problem" with normalization is that it will make your SQL queries more complex, which it sounds (admittedly I just skimmed) like you're trying to avoid; again, I don't think that's a good idea in the long run.