Do you think that such data storage thing has a future?

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Do you think that such data storage thing has a future?

Post by Gambler »

http://nengine.korsengineering.com/file ... nodes.phps

Needed tables:

Code: Select all

CREATE TABLE attrs (
  nodeId int(10) unsigned NOT NULL default '0',
  name varchar(30) NOT NULL default '',
  type enum('string','integer','float','boolean','null','array','object','text') NOT NULL default 'string',
  value mediumtext NOT NULL,
  PRIMARY KEY  (nodeId,name),
  KEY valueIndex (value(30)),
  FULLTEXT KEY valueFulltext (value)
);

CREATE TABLE links (
  nodeAId int(11) NOT NULL default '0',
  relation varchar(30) NOT NULL default '',
  nodeBId int(11) NOT NULL default '0',
  PRIMARY KEY  (nodeAId,relation,nodeBId),
  KEY nodeBId (nodeBId,relation)
);

CREATE TABLE nodes (
  id int(10) unsigned NOT NULL auto_increment,
  kind varchar(30) NOT NULL default '',
  alias varchar(30) default NULL,
  parent int(11) NOT NULL default '0',
  PRIMARY KEY  (id),
  UNIQUE KEY aliasKEY (kind,alias),
  KEY parent (parent)
);
The idea is to make schema-less data storage that does not require writing any SQL and keeps variable types. Im posting it in design part of the forum, because I'm interested in thoughts on my design.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

I dont have the time to look at it right now, but there is already a pear package (nested_sets) that does something similar. Meaby it can inspire you :)
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Post by Gambler »

Well, this pear package is different in many aspects, but still... do you think such things have a future? Could they be used like ActiveRecord is used now?
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

I think more and more SQL DBMS will start to support it natively...
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Post by Gambler »

I doubt it. Besides, one of the benefits of using such storage models is that it is integrated into the language. Actually, I'm thinking about writing similar thing, but without MySQL. PHP 5 should be powerful enough for such task, but I'm not sure I am.
User avatar
BDKR
DevNet Resident
Posts: 1207
Joined: Sat Jun 08, 2002 1:24 pm
Location: Florida
Contact:

Post by BDKR »

This has all allready been done. It's called an OODBMS. The idea of managing data in a hierarchal fashion in a relational storage medium has led to what's been called an "Impedance Mismatch". Look those terms up.

I've been in on a project that did just this. Map tree like data structures into a relational DB. It had just one data storage table and the types were also stored with the data as well. It was slow as Christmas. The queries needed to rebuild the relationships between bits of data were big and very slow (I was the new guy there and I knew we were in trouble when they told me they found the upper limit of how many joins MySQL could do at a time!). My suggestion would be the same one that a lot of OODBMS makers came up with and that is to cache as much of the information as possible with updates back to the DB at key points.


It's my suspicion that a lot of newer ERP systems and the newest QuickBooks Enterprise edition may be doing this kind of thing as well. They have HUGE memory requirements for the client systems all talk about being able to "drill down" to specific bits of data, yada, yada, yada.... which to me is a hint that it's via some traversal type process. Of course, it's much faster to traverse a tree in memory then via SQL right? :wink:

Also try reading some rant by Fabian Pascal. He'll sound very negative, but try to chew past that and you'll find that there is some good data there.

So what is my answer to the question? I doubt it. There is a place for heirarchal data structures, but not in Relational Databases. If the OODBMS vendors ever get something that people like, things may change, but I don't even see much of a future for those guys in certain markets (large scale high volume web services) based on (the last time I looked) their caching solutions. Especially if the idea of shared nothing continues to gain ground.

Cheers,
BDKR
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Post by Gambler »

It was slow as Christmas.
Could you please open any page of http://korsengineering.com/ (it uses that DB package), view the page's source code and scroll down to the bottom? There will be a comment with page generation time. Not bad, eh?

Caching is hot yet implemented, and my code is compatible with MySQL 3.x and PHP 4.3. There is plenty of room for improving performance. Using MySQL 5 queries would significantly speed the things up. Rewriting the whole thing in pure PHP 5 (without the use of MySQL) would elliminate all kinds of oveheads, making this thing truly scalable.
User avatar
BDKR
DevNet Resident
Posts: 1207
Joined: Sat Jun 08, 2002 1:24 pm
Location: Florida
Contact:

Post by BDKR »

Gambler wrote:
It was slow as Christmas.
Could you please open any page of http://korsengineering.com/ (it uses that DB package), view the page's source code and scroll down to the bottom? There will be a comment with page generation time. Not bad, eh?
No, it's not bad, but ....

1) How do I know that's not based on cached queries (MySQL has had a query cache since 4.0).
2) How do I know they aren't caching data in memory (Turckmcache has an API that allows one to do just that, as just one example)
3) How do I know that large portions of that page aren't cached somehow with only small bits being generated dynamically.
4) That page alone doesn't look like that big of a dataset. With the last team I worked for, the performance appeared to be good until they threw a dataset at it that was more representative of the apps real world intended usage.
Gambler wrote: Caching is hot yet implemented, and my code is compatible with MySQL 3.x and PHP 4.3. There is plenty of room for improving performance. Using MySQL 5 queries would significantly speed the things up. Rewriting the whole thing in pure PHP 5 (without the use of MySQL) would elliminate all kinds of oveheads, making this thing truly scalable.
I don't think MySQL 5's query optimization is going to make THAT big of a difference. Sure, they've prolly found some speed, but the myisam tables were allready pretty damn fast at simple selects anyways. The real question

As for writing the entire thing in PHP5 with no storage (no database), care to explain in more detail? Are you going to persist this data in memory or something? Is this a web based app? If so, where is the information going to live between requests? In a file?

There is a reason the relational database was invented. I think we should be careful that we don't forget that. Below is one of my favorite quotes along those lines. Emphasis is mine.
Fabian Pascal in an Interview with Tony Shaw from Wilshire Conferences wrote: XML was invented by text publishers, who had no knowledge of data management, purportedly for data exchange. But exchange requires a physical format, not a data model. First, there are tons of formats in the industry and any one could have been used, why invent yet another? And second, XML is actually a bad physical format for exchange; it is highly and unnecessarily inefficient, to the point where it is increasingly violated to get performance out if it.

Now they are adding a data model to it, to be able to do any data management (see Tags Do Not a Language Make) and, as Chris Date points out, the first thing they had to do to define their “model” was to discard the notion of an XML document as the fundamental data object! What can you conclude from this fact? The model they did come up with is the same hierarchic model which we discarded 30 years ago and replaced with SQL, because it was too complex, inflexible and lacked rigor. I call the whole insanity “The Exchange Tail and the Management Dog”, the title of my new seminar. Would such regressions be accepted if practitioners understood data fundamentals? No way.
What you're suggesting is essentially a hierarchal data structure and as a programmer or biologist, it can seem pretty intuitive. But as a data model, it shouldn't be extended into the real of data management.

Cheers
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Post by Gambler »

No, it's not bad, but ....
I just know that cases 1, 2 and 3 do not apply here. No caching. (Even though query caching would be fair, because other DB applications could use it as well.)
As for writing the entire thing in PHP5 with no storage (no database), care to explain in more detail? Are you going to persist this data in memory or something?
B-Tree files, probably. Not sure how slow that would be, though. But using database as a backend adds huge overhead to every operation.
What you're suggesting is essentially a hierarchal data structure and as a programmer or biologist, it can seem pretty intuitive.
No, not exactly. It's not all about heirachy, it's mostly about adding attributes and new kinds of nodes on demand. (Also, and I'm tired of writing SQL queries.)
User avatar
BDKR
DevNet Resident
Posts: 1207
Joined: Sat Jun 08, 2002 1:24 pm
Location: Florida
Contact:

Post by BDKR »

Gambler wrote:
What you're suggesting is essentially a hierarchal data structure and as a programmer or biologist, it can seem pretty intuitive.
No, not exactly. It's not all about heirachy, it's mostly about adding attributes and new kinds of nodes on demand. (Also, and I'm tired of writing SQL queries.)
:D I hear ya on the query writing part. LOL! Let us know how it works out. Obviously I've taken a particular position on this, but I'd like to hear or see some benchmarks on it. The use of B-Tree files is an interesting one and I'd love to hear the outcome.

Cheers,
BDKR
Gambler
Forum Contributor
Posts: 246
Joined: Thu Dec 08, 2005 7:10 pm

Post by Gambler »

Let us know how it works out.
Will do. That is, if I manage to overcome my laziness and actually start coding the thing. *smiley*
Post Reply