Data storage medium

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Data storage medium

Post by alex.barylski »

For the longest time, I have been confronted by personal demons which could not decide what storage medium was the best for certain kinds of data.

I think most people just go with SQL, but don't really consider other factors...

For instance, member authentication data (user/pass) could very well be better served using LDAP or similar. If you used a specially constructed text file, stored in assumingly a safe place (outside doc root) you could likely write a highly efficient and secure authentication extension.

This has the following benefits and disadvantages:

1) Security is tighter. Whenever you use SQL and also third party products which also require a database, assuming your all under the same DB, you open your application up to attack by virtue of SQL injection...
2) Efficiency is greater. Yes, MySQL and SQLite even more so are blistering fast, but I gaurantee you I could write a simple authentication extension which out performed them.

The problem with this approach, is that MySQL is pretty much standard across the board as a means for storage on almost every server on the planet, custom extensions are obviously limited and worthless in this regard.

Why consider the above in the first place???

Well, I'm am forever coming across libaries which offer some sort of agnostic data store, in that, by using a abstraction layer, you can switch your Session management from file based to SQL to LDAP, etc...

I've considered this immensely in the past as there are advantages to using a different medium outside of somehting fixed like SQL...security being one, like mentioned above...

Sure using a seperate database might assist in solving this issue for untrusted third party scripts, but still, what about shared hosts, etc where you are limited ot a single database?

Another advantage I have always found in offering different storage mediums, such as LDAP, etc...is that some mediums are better suited for certain data.

I won't get into this again, but structured data, such as authentication info and roles (which are by nature most of the time, organzied) would be IMHO better stored in a medium or technology which supports natural oragnization, such as that what LDAP or XML may provide...

Secondly, I personally find comfort in keeping certain data stored in different mediums as almost a means of further abstracting the design of an application...

Hard to explain what I mean, but basically, storing files as files and data as data in a RDBMS helps seperate your application design overall, much like MVC does for application code.

Keeping authentication info in a medium other than RDBMS, helps almost seperate that important section of code from the other sections which are maybe more record based.

Divide and conquer (sp) is something I have always tried to do with any project, constantly so for me it makes sense to always look forbetter suited mediums for certain kinds of data...

However now I am begining to consider (as my ORM class gets more and more complete) that perhaps, RDBMS is the way to go :P

Talk about a biased opinion eh?

Clearly, there are advantages to having all data stored in a RDBMS, especially if I can get my ORM classes to work as expected...making working with most SQL data a breeze...

The idea of centralized data storage also appeals to me, despite the benefit of divide and conquer and that feeling of additional abstraction...

So I'm officially confused as to which argument to support...

What I'd like to start is, not an argument, nor a debate, but disscussion on which you agree with and why, not belittling either side, because I will counter attack both and sound like a hypocrite... :P

Give me your positives and negatives for both and maybe after hearing some other opinions, I can finally ultimately conclude which direction to go in...

Cheers :)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I tend to need both, however I prefer using a minimum number of storage locations (database and file system, really). This is both for simplicity and sanity.

In my "world," files are mostly kept in the file system, everything else in the database. But being able to support other mediums is advantageous to adoption and adaption. The problem is, you have to support it. A paradox. :)
thiscatis
Forum Contributor
Posts: 434
Joined: Thu Jul 20, 2006 11:00 am

Post by thiscatis »

At my work we use LDAP authentication.
I've also seen that large PHP webapplication give to option for LDAP aut's too lately
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Re: Data storage medium

Post by timvw »

Hockey wrote: 1) Security is tighter. Whenever you use SQL and also third party products which also require a database, assuming your all under the same DB, you open your application up to attack by virtue of SQL injection...
-) Why would that silly developer that needs a database for his application get access rights to the database with credentials too?

-) I'm pretty sure that a deficient developer would write applications that allows 'LDAP-query' injection just as easily as SQL injection...
Hockey wrote: 2) Efficiency is greater. Yes, MySQL and SQLite even more so are blistering fast, but I gaurantee you I could write a simple authentication extension which out performed them.
It all depends on your needs... Eg: writing into an LDAP server usually is much slower than the same operation in an SQL-dbms. As long as there are no performance problems with a solution, i don't care much about the absolute performance of a solution...
Hockey wrote: The problem with this approach, is that MySQL is pretty much standard across the board as a means for storage on almost every server on the planet, custom extensions are obviously limited and worthless in this regard.
I know plenty of companies that refuse to use MySQL... As it corrupts (atleast corrupted) data... Eg: inserting '99-99-9999' as a date should at best return an error that the date was invalid... But MySQL happily accept(ed) it and stored it about '00-00-000'. That's not what i call 'reliable data storage' :p
Hockey wrote: Well, I'm am forever coming across libaries which offer some sort of agnostic data store, in that, by using a abstraction layer, you can switch your Session management from file based to SQL to LDAP, etc...
I'm not so sure if it's useful to abstract these things anyway... An SQL dbms does have much different features than a filesystem or ldap tree...
Hockey wrote: Another advantage I have always found in offering different storage mediums, such as LDAP, etc...is that some mediums are better suited for certain data.
I agree...
Hockey wrote: I won't get into this again, but structured data, such as authentication info and roles (which are by nature most of the time, organzied) would be IMHO better stored in a medium or technology which supports natural oragnization, such as that what LDAP or XML may provide...
If you have enough $$$ you'll see that most sql dbms companies are willing to deliver an extension that allows you to easily query hierarchies too... But since this thread seems to be about 'data storage' and not about 'data modelling' i won't get into this...
Hockey wrote: Secondly, I personally find comfort in keeping certain data stored in different mediums as almost a means of further abstracting the design of an application... Hard to explain what I mean, but basically, storing files as files and data as data in a RDBMS helps seperate your application design overall, much like MVC does for application code.
I can see why you would want to abstract the underlying technology.. On the other hand, there are times i really want to know if i'm using a filesystem (and should expect data to be available very fast) or that i'm using a webservice that might take a while longer to return the data i need...
Hockey wrote: However now I am begining to consider (as my ORM class gets more and more complete) that perhaps, RDBMS is the way to go :P
Offcourse it's not always the way to go :P
Eg: I prefer to store my music, pictures, videos in a filesystem...

Hockey wrote: The idea of centralized data storage also appeals to me, despite the benefit of divide and conquer and that feeling of additional abstraction...
The best performing SQL dbms i know doesn't have a centralized access point, it has information spread over X harddisks... replicates Y times over the network.. But here comes the benefit: It offers an interface that makes it appear as if the data is right there... And the same story goes for eg: Google File System, EMC products, ....
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Data storage medium

Post by alex.barylski »

-) Why would that silly developer that needs a database for his application get access rights to the database with credentials too?

-) I'm pretty sure that a deficient developer would write applications that allows 'LDAP-query' injection just as easily as SQL injection...
Well I was hinting more towards the benefit of a library which dealt with fixed size records, in plain ASCII text files...

With a minimal library written to just do authentication, there would be little chance of data theft, whereas a SQL injection attack if trivial, attacking a buffer overflow is somewhat more complicated and unlikely if you develop the library using memory allocation classes, etc...
It all depends on your needs... Eg: writing into an LDAP server usually is much slower than the same operation in an SQL-dbms. As long as there are no performance problems with a solution, i don't care much about the absolute performance of a solution...
I'm not that speed obsessed either :P

The point was, that by writting a custom extension which authenticated against a trivial data store using a trivial API, security and speed would kind of be a nice bonus...

Just a positive side effect is all...but of course there are negatives...
know plenty of companies that refuse to use MySQL... As it corrupts (atleast corrupted) data... Eg: inserting '99-99-9999' as a date should at best return an error that the date was invalid... But MySQL happily accept(ed) it and stored it about '00-00-000'. That's not what i call 'reliable data storage' :p
Fair enough, but the point was, if I wrote my own custom authentication library as an PHP extension and used it in my apps, I can almost gaurantee (sp?) that MySQL would be more likely supported across the board...

SQL in general, I htink I have seen on every server I have ever seen...shared, virtual, dedicated, etc...
I'm not so sure if it's useful to abstract these things anyway... An SQL dbms does have much different features than a filesystem or ldap tree...
Exactly, but why is it so popular then? I can understand SQL abstraction layer, because of the need to switch systems, but many other kinds of libraries offer abstraction, including my own authentication class I designed...

I have my doughts about it, despite the possible usefulness behind it...

Switching session management from SQL to text files, the file/directory and possibly LDAP doesn't really make sense and it adds another layer of complexity to the library...and for what?
I agree...
Glad to see we completely agree on something :P
If you have enough $$$ you'll see that most sql dbms companies are willing to deliver an extension that allows you to easily query hierarchies too... But since this thread seems to be about 'data storage' and not about 'data modelling' i won't get into this...
Like an OODBMS extension? Much like ORM, but more native to the database?

Not sure if that solves my issue...

Although it offers the illusion of hierarchy and I do really like the idea of OR/M...

I just...I dunno to be honest...I'm 50/50 on this one...
I can see why you would want to abstract the underlying technology.. On the other hand, there are times i really want to know if i'm using a filesystem (and should expect data to be available very fast) or that i'm using a webservice that might take a while longer to return the data i need...
I'm not sure if abstraction is what I really meant...

But logical seperation for sure...

SQL for instance does both, it helps seperate your querying code and seperate that actual code with a 'more' english like statement...

Much like EBNF does for parsing or HTML does for GUI design...

It's possible to design an applicaiton using a single monolithic index.php, but it makes more sense to segment your applicaiton into logical parts...

HTML, PHP, IMAGES, FILES, SQL, etc...

For that reason, if I ever need to modify my authentication classes, data, etc...

Instead of having to filter through possibly 1,000's of MySQL tables finding the authentication table in question (assuming I don't use multiple DB's) I would know that it's the special file/library code, etc and could likely find it easier, faster, etc...

Know what I mean?

Like I said, I'm a huge divide and conquer type programmer, I'm always looking for ways to further abstract, divide, seperate...

In fact I probably spend more time doing that, than I do programming... :P

Hard to get paid for that whilst not being a researcher, but in my spare time...I do it alot...
Offcourse it's not always the way to go
Eg: I prefer to store my music, pictures, videos in a filesystem...
Sure, I agree, but at the same time...files in a database, does actually make sense...despite not being 'really' designed for that purpose...

It's one way to protect files, without needing access to outside of your document root, which many Shared hosts in my experience, dont' allow...
The best performing SQL dbms i know doesn't have a centralized access point, it has information spread over X harddisks... replicates Y times over the network.. But here comes the benefit: It offers an interface that makes it appear as if the data is right there... And the same story goes for eg: Google File System, EMC products, ....
Thats not really what I meant... :P

I was refering to the illusion of centralized data...

I look at tables inside my phpMyAdmin and apparently all the tables are under one or more DB's, at this point I"m not paying attention to the underlying clustering of HDD, etc...

Cheers :)
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Re: Data storage medium

Post by timvw »

Hockey wrote: *snip more about authentication libraries*
The point was, that by writting a custom extension which authenticated against a trivial data store using a trivial API, security and speed would kind of be a nice bonus...
I can agree with that point. On linux there is PAM (Pluggable Authentication Modules) which does exactly what you mentionned... And there are PHP extensions available too...
Hockey wrote: Exactly, but why is it so popular then? I can understand SQL abstraction layer, because of the need to switch systems, but many other kinds of libraries offer abstraction, including my own authentication class I designed...
Sometimes i ask myself the same question why SQL is so popular.. Certainly when i notice that virtually every SQL-dbms vendor has a different dialect for his product.. Which makes me wonder about the 'Standard' in SQL...
Hockey wrote:
If you have enough $$$ you'll see that most sql dbms companies are willing to deliver an extension that allows you to easily query hierarchies too... But since this thread seems to be about 'data storage' and not about 'data modelling' i won't get into this...
Like an OODBMS extension? Much like ORM, but more native to the database?
By adding keywords to their 'Standard'QL product... Eg: http://www.dbazine.com/oracle/or-articles/mishra3.
Hockey wrote: Although it offers the illusion of hierarchy and I do really like the idea of OR/M...
I like some forms of ORM too (eg: Hibernate3 / Java Persistence API)
Hockey wrote:But logical seperation for sure...
I agree with making 'onion layers' in software design... :)
Hockey wrote: SQL for instance does both, it helps seperate your querying code and seperate that actual code with a 'more' english like statement...
In theory a relational dbms has three layers: a representation layer, a logical layer and a physical layer. And yes, each layer has it's own language too... (data manipulation language, data definition language and data store language)

The problem is that because of problems with performance in the physical layer of SQL dbms lots of people have decided to denormalize their model... (And thus they gave up the layering...)
Hockey wrote: Like I said, I'm a huge divide and conquer type programmer, I'm always looking for ways to further abstract, divide, seperate...
In fact I probably spend more time doing that, than I do programming... :P
I agree (once more) with you... I also promote loose coupling :) Imho programming is more than only typing code.. It's also thinking about the (de)coupling of libraries etc..

Hockey wrote:
Offcourse it's not always the way to go
Eg: I prefer to store my music, pictures, videos in a filesystem...
Sure, I agree, but at the same time...files in a database, does actually make sense...despite not being 'really' designed for that purpose...
Actually, given WinFS and ReiserFS4 i think it's about time we get a filesystem that can be accessed like a SQL dbms... And i don't think it's unreasonable to expect that it performs well..
Hockey wrote: It's one way to protect files, without needing access to outside of your document root, which many Shared hosts in my experience, dont' allow...
I agree (third time or so... We need beers :p) that storing pictures (or other binary data) is attractive..

Eg: with PDO these days you don't have to pass the complete blob via memory.. All you need to pass is a filepointer... which improves performance...

(I gave up on shared hosts since they all have different policies, restrictions, supported extensions, configurations, ... )
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Data storage medium

Post by alex.barylski »

Actually, given WinFS and ReiserFS4 i think it's about time we get a filesystem that can be accessed like a SQL dbms... And i don't think it's unreasonable to expect that it performs well..
I've heard of WinFS but have zero understanding of it, so you got me there...

As for ReiserFS4...no idea, i'll have to read up on them...

I was under the impression that Vista (longhorn) was supposed to actually being used a powerful SQLServer storage type file system as opposed to the traditional...

I think the idea was scrapped as it would have taken to long to implement...
I agree (third time or so... We need beers :p) that storing pictures (or other binary data) is attractive..
I'd prefer water :P Last night I drank an estimated 17 beers and then consumed a six pack...and today I feel like...just drinking water :P

Your in Belgium? Aren't Belgium beers really thick like Irish beers, Guiness(sp?) or Kilkenny?

I hate the taste of booze, so I can't appreciate a fine beer from a garbage one (albeit some are easier to drink than others - Thick beers I have a hard time drinking)

*ducks before getting smoked with another apple*

I just drink to get drunk :P

True beer enthusiasts just hate me...

A few good friends of mine are true beer conosuieers (spelled so badly I couldn't even find a relative match on dictionary.com :P ) and the city I live in once a year has a festival called Folklorama where every culture in the city puts on a pavillion where you can vist to observe foriegn culture, foods, beer, dance, etc...

We visit many each year...some are repeats (Irish pavillion is really neat and the Italian/Ukrainian ones have awesome food) but often we goto others...so they've consumed beer from around the world...

This is where i'm getting the "aren't Beligium beers thick" from :P

I find beer has a regional taste to it...

Like some Canadian beers are very...crisp...whereas Yankee beer like Budweiser...has a strong flavour...

Heineken tastes like...actually you know what...this deserves a thread onto it self...so I'll start a new one :P
(I gave up on shared hosts since they all have different policies, restrictions, supported extensions, configurations, ...
It's annoying, thats why I've continued to support the one company I've used for about 5 years at least...

Although sometimes I question their security and actual understanding of system administration...it's why I don't want to post a link...incase their servers are insecure...this would open them up for attack :P and my web site(s) with it...
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Re: Data storage medium

Post by timvw »

Hockey wrote: Your in Belgium? Aren't Belgium beers really thick like Irish beers, Guiness(sp?) or Kilkenny?
We have all sorts of beer :p http://en.wikipedia.org/wiki/Belgian_beer...

Even the 'i drink to get drunk' beers :) We tend to call them 'Pils'... And best of all, any 15 year old one can order (one or more) without any problem... :)
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

15... 8O

I didnt' drink a spot of anything until 19...and the legal age here is 18...

We get a lot of Americans coming up during spring break cuz I'm guessing it's 21 in Michigan, etc...and Ontario is 19 I think...

So they come here and go ballistic...especially when the dolloar was peanuts compared to the USD... :P

Good times had by all...except when the occasional moron thinks he's pro Canadian and they beat up a visitor, end up in the paper and starta feud between Canada and the States...thankfully those idiots are often quickly forgotten about... :P

Youth and Booze = Disaster waiting to happen :P
Post Reply