http://www.sitepoint.com/blogs/2006/10/ ... abase-too/
What do you think?
Opinions on a SitePoint blog post - Storing files in the DB
Moderator: General Moderators
- johno
- Forum Commoner
- Posts: 36
- Joined: Fri May 05, 2006 6:54 am
- Location: Bratislava/Slovakia
- Contact:
Raised some interesting thoughts about transactions and indexes, but then trying to cache it for performance on disk? Where all this started? Ah, storing binaries in DB not files.
As far as I know DB servers are mostly bottlenecks so I don't think that giving them extra work is generaly a good idea. But, ... well there are always some special cases.
As far as I know DB servers are mostly bottlenecks so I don't think that giving them extra work is generaly a good idea. But, ... well there are always some special cases.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
When we talk about binary files, the first thing that pops to mind are images: png, jpg, gif, etc. For them, it makes very little sense to store them in the database: Apache + Filesystem already does a great job of serving these static things and we don't need to add another PHP + Database layer to the whole jig. Next thing that pops to mind are mp3, exe, zip, in short, things that are usually very big. Once again, the extended download time is handled gracefully by Apache, but not by PHP (especially if it's running as FastCGI: each download spawns a PHP process and can easily overrun the process limit on shared hosts).
For most practical purposes binary files will be stored filesystem.
For most practical purposes binary files will be stored filesystem.
-
alex.barylski
- DevNet Evangelist
- Posts: 6267
- Joined: Tue Dec 21, 2004 5:00 pm
- Location: Winnipeg
I was curious about this myself and did some research a while back.
I actually found, of course I cannot find it again, an benchmark done by MIT I think it was, which showed that retreival times were actually faster than a native Linux file system ext3. Of course the RDBMS wasn't indicated, but when the two were compared, DB was actually faster.
I'm currently reading up on the *nix kernel and I am somewhat convinced that using a DB to retreive might very well be faster, as an optimized DB with indexing, etc is (from what I can tell as of now) less work than a file system. I'm not sure if file indexing or fetching is handled by the VFS (I imagine it is) but that is a complicated layer of abstraction which would undoughtedly(sp???) slow things down a bit.
Of course this depends on the FS used. I'm sure you could customize the FS to serve files equally as fast, which is what Google did I think,
http://en.wikipedia.org/wiki/Google_File_System
On the typical web server system however, I would argue that because of the VFS layer and additional ext3 overhead (which I assume most Linux distros will start using by default - if not already???) it's likely faster to use a DB engine.
Ultimately I think it depends on how you view what a RDBMS is. Personally, for long time, MySQL was a system used to store linear, albeit related *records* of information, not sure if I can accept MySQL for anything more (ie: file storage system).
M$ is heading in that direction with their own (was supposed to be realesed with Vista - but I don't think it is) SQL storage file system...so that says something I guess.
DB file storage on the web has a serious problem though. That is, every file retreived would need to open a *new* connection to the database. Opening a connection is over head which should be avoided, but impossible to do so without caching the files on the native file system, in which case, why store the files on a DB in the first place?
Really, I think this is a matter of opinion and personal taste, as both sides have valid arguments...
Cheers
I actually found, of course I cannot find it again, an benchmark done by MIT I think it was, which showed that retreival times were actually faster than a native Linux file system ext3. Of course the RDBMS wasn't indicated, but when the two were compared, DB was actually faster.
I'm currently reading up on the *nix kernel and I am somewhat convinced that using a DB to retreive might very well be faster, as an optimized DB with indexing, etc is (from what I can tell as of now) less work than a file system. I'm not sure if file indexing or fetching is handled by the VFS (I imagine it is) but that is a complicated layer of abstraction which would undoughtedly(sp???) slow things down a bit.
Of course this depends on the FS used. I'm sure you could customize the FS to serve files equally as fast, which is what Google did I think,
http://en.wikipedia.org/wiki/Google_File_System
On the typical web server system however, I would argue that because of the VFS layer and additional ext3 overhead (which I assume most Linux distros will start using by default - if not already???) it's likely faster to use a DB engine.
Ultimately I think it depends on how you view what a RDBMS is. Personally, for long time, MySQL was a system used to store linear, albeit related *records* of information, not sure if I can accept MySQL for anything more (ie: file storage system).
M$ is heading in that direction with their own (was supposed to be realesed with Vista - but I don't think it is) SQL storage file system...so that says something I guess.
DB file storage on the web has a serious problem though. That is, every file retreived would need to open a *new* connection to the database. Opening a connection is over head which should be avoided, but impossible to do so without caching the files on the native file system, in which case, why store the files on a DB in the first place?
Really, I think this is a matter of opinion and personal taste, as both sides have valid arguments...
Cheers