Page 1 of 1

DB versus flat file performance

Posted: Sun Sep 03, 2006 9:12 pm
by alex.barylski
I've have read numerous forum threads over the last couple years of people debating the performance of storing files inside a database as opposed to the native file system...

http://www.mail-archive.com/struts-user ... 18853.html[^]

Here is one such debate...

I'm wondering if anyone knows any others...?

It seems to me that storing files on the file system as opposed to a database would be more efficient as file indexing, etc is performed at the kernel level...

Some databases might capitalize on RAID, etc...but still I would personally think native file system indexing would win hands down when both tweaked for performance, especially on a linux system with all it's available file systems?

I'd like articles (more credible the better) but just your opinions or experiences would be awesome as well

Cheers :)

Posted: Sun Sep 03, 2006 9:18 pm
by feyd
We've talked about it several times here. I'm not going to dig them out for you this time, however.

Posted: Sun Sep 03, 2006 9:54 pm
by alex.barylski
I've searched...using the wrong keywords I guess, but I get 1,200 results...in which maybe 1% are worth looking into...

How do you search just the title for keywords? :?

Anyways, how about I start a thread (using this) to debate the benefits of DB over file system?

I've found some articles which indicate that retreival (especially in today's RDBMS) is actually just as fast and one benchmark that suggests DB retreival is even faster!!! :P

Who would have thunk it...although it doesn't indicate what file system was used when compared to what DB...

http://www.cis.ksu.edu/~rpalani/MSRepor ... Server.htm

I had another from oracle that said the same thing, but did not show any benchmarks...

So other than:
- Increased security
- Possiblly faster retreival
- Tamper resistent (FTP, etc)

What are some other advantages to using a DB???

For me the only concern is retreival speed as I don't care about additional HDD consumption, etc...

I've personally ofund that images fetched from a database tend to load a helluva lot slower...I guess because each call to a fetching script would incur the cost of connecting to a DB, etc...so I'm not sure if caching would help in that case...

I wonder if those tests/benchmarks took into consideration that a web application cannot effectively capitalize on caching, etc...

Is this still redundant??? Lock it or drop it if it is...I understand...I'll take the hint and keep on searching :P

Edit: It just occured to me...but are images cached on the client side? Even if the HTML page is defined as NON cacheable? Does each individual file need to declare it's cacheability on the client side using header()???

Cheers :)

Re: DB versus flat file performance

Posted: Mon Sep 04, 2006 1:44 am
by timvw
Hockey wrote:I've have read numerous forum threads over the last couple years of people debating the performance of storing files inside a database as opposed to the native file system...
- I guess you would need to define 'database' better... Since imho a file system is a database too.. Specialized for files...

- On the implementation level there is an overlapping area where filesystems use database for faster access to file (eg: indexes to lookup where a file is stored) and databases use filesystems to store data...

- Most filesystems only have a limited permission system. If you want more complicated permission systems you have to add something on top. An average SQL Dbms has a more advanced permission system

Conclusion: it depends on the way you want to access the data.

- From an end-user POV i think we're evolving towards a unified interface to access all sorts of data (File System, SQL Dbms, WebService, WWW, ...) and we can forget discussions like these...

Re: DB versus flat file performance

Posted: Mon Sep 04, 2006 1:53 am
by alex.barylski
I guess you would need to define 'database' better... Since imho a file system is a database too.. Specialized for files...
My bad...DB = RDBMS ;)
From an end-user POV i think we're evolving towards a unified interface to access all sorts of data (File System, SQL Dbms, WebService, WWW, ...) and we can forget discussions like these...
Agreed WinFS from what I remember is based on SQLServer...

Although here is thing that makes my situation sticky...

I want to store all my images for a web site layout in a RDBMS (MySQL, etc) but each time a file is fetched, it would require a new connection to the DB and the connection ontop of indexing, etc...would likely slow things down considerably...at least it's my experience when using apps that store *every* in the DB.

No I do not wish to store layout images on the HDD :P It's either all DB or nothing at all...I justify this by saying I want to keep content, etc in one similar location so any possibly complicated code I need to write is only written once (DB or FS).

At the end of my post to feyd, I made an edit which noted that if images *are* cached on the client side and NOT requested each page refresh, then the DB connection issue won't be an issue... :)

See where i'm going with this and the problem is poses (if it is a problem)?

Cheers :)

Posted: Mon Sep 04, 2006 2:11 am
by timvw
You could always use a cache system.. (if i'm not mistaken adodb has one for example)...

Meaning:
- Only manipulate data via the database.
- When the data is request by the user your cache system checks if there is already a usable version in the cache (filesystem/memory).. And otherwise you perform the 'expensive' query...

And if you implement conditional get requests you can makes this even way more efficient...

Posted: Mon Sep 04, 2006 2:52 am
by alex.barylski
Well I considered implementing a cache myself as files are fetched through a secondary script not through index.php

But if images are already cached by the client, there is no need to implement anything like that, as it's handled by the browser...

However I also wanted to implement some form of image burning protection which invalidates the URL and would cause the client to re-download the image (I think). so perhaps my own home grown caching would be best - thanks for making me think about that one...didn't occur to me until now :P

Secondly I'm using Zend not AdoDB, not sure if it supports caching...none the less either would liekly require using the file system, which is what i'm trying to get away from :P

Some more thinking is required i think :)

Cheers ;)

Posted: Mon Sep 04, 2006 3:51 am
by CoderGoblin
OK only my opinion....

Relational Database Systems are useful for storing things which are relational information. Images are a black box as far as I am concerned, I never need to change them or look in them, just retrieve them so storing them within the database is a waste of resources. I tend to use the filesystem for image storage with a sensible naming convention (for both directory structure and file) to enable retrieval. Things to be aware of are any operating system restrictions (how many files per directory, max levels of directory nesting, max length of fullpath etc).