Storing photos in DB

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
icesolid
Forum Regular
Posts: 502
Joined: Mon May 06, 2002 9:36 pm
Location: Buffalo, NY

Storing photos in DB

Post by icesolid »

I have a server that has over 500,000 photos and growing. They are all located in directories on the server but have reference points to them in a MySQL database.

EX:

Code: Select all

id        title                  file
100       Example Image          P1508404.JPG
Is it better to do it the way I am now, or is it better to store the actual file date in a table, in a database in a BLOB field?
User avatar
Sofw_Arch_Dev
Forum Commoner
Posts: 60
Joined: Tue Mar 16, 2010 4:06 pm
Location: San Francisco, California, US

Re: Storing photos in DB

Post by Sofw_Arch_Dev »

If your system is a one-server environment, and if that's all it'll ever be, then what you have will continue to work. If you ever want to have multiple, separate servers accessing the photos then each server will need to mount the volume that the photos are stored on. You can solve that problem with a network file system, but NFS mounts have their own maintenance issues and they don't scale as easily as storing data in a DB. If you plan to grow to multiple servers accessing the photos, it'll be a lot easier for multiple servers to open a DB connection to a given database host and be done with it.
icesolid
Forum Regular
Posts: 502
Joined: Mon May 06, 2002 9:36 pm
Location: Buffalo, NY

Re: Storing photos in DB

Post by icesolid »

Should I be worried about speed or loosing photo integrity in anyway with storing them all in the database, I am talking over 500,000 photos at 640 X 480
User avatar
Sofw_Arch_Dev
Forum Commoner
Posts: 60
Joined: Tue Mar 16, 2010 4:06 pm
Location: San Francisco, California, US

Re: Storing photos in DB

Post by Sofw_Arch_Dev »

You don't have to worry about losing photo integrity when putting images in your database. Whether on a file-system on a hard drive, or in a database file on a hard drive it makes no difference there. It's all just data.

With respect to speed, I guess it all comes down to how many people you plan on supporting simultaneously. Any web server will allow only the number of simultaneous connections you configure it to. The same goes for a database. MySQL's default is to allow 100 simultaneous connections. They claim that that number is small. I'm not sure how much traffic your photo site is planning on getting but you can up the number of simultaneous connections to whatever makes sense given your expected usage and given how much memory you have on the box. Connections take memory. Just how much is worth a test.

I store 6Mb JPG and PNG files in a database for a large corporation's db-driven food website. No problems.
icesolid
Forum Regular
Posts: 502
Joined: Mon May 06, 2002 9:36 pm
Location: Buffalo, NY

Re: Storing photos in DB

Post by icesolid »

Good to know, I appreciate the words of wisdom.

I would say there would be about 4,000 upload/delete/view operations made per day (roughly). Should be fine?

I have grown tired of trying to match up files in a file-system and pointers in a table. I really want to take advantage of a relation database for this website.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Storing photos in DB

Post by Eran »

The actual common practice is the opposite of what Sofw_Arch_Dev suggests. As far as using multiple servers, there is no difference whether the images are stored on a database or the file system - both can be accessed remotely and there is no need for complicating matters with NFS mounts. Just as a database connection can be opened, an HTTP request can be made to another server. In fact, if the files are in the file system, they can be served directly from other servers, with no need to use any queries or scripting to achieve it. Bear in mind that the database stores its data ultimately on the file-system, so there's no avoiding that.

Storing files in the file-system has several advantages over storing it in the database -
* Maintainability and flexibility - it is much easier to deal with images in the file-system. You can serve it directly without going through the database and issuing a query and then parsing the results using a script. You can also easily move files around and back them up through file system commands, which are much simpler than SQL. This means that it requires much less technical knowledge and time to deal with images.
* Performance - one of the biggest concerns with databases is to be able to fit the live data in memory. Queries against data in the buffer pool are many times quicker than queries that have to use I/O operations to find / retrieve results. For this purpose the general goal is to control the database size as best as possible, and storing images (or files in general) goes directly against it. Images will usually take up many times more space in the database than other columns, quickly ballooning its size. Also, you can't cache database images (not easily anyway), which means they will be requested again on every request. Add to this the additional strain on the database if the images are requested often / by many users, and the database will be in pretty bad shape.
* Scaling - this might not be an immediate concern for you, but since multiple servers were brought up - scaling database to multiple servers is one of the most complicated scaling issues. Distributing files on multiple servers is much simpler, and you will not be able to use a CDN solution for your images if they are stored in a database.

I would definitely advise against storing images in the database for those reasons.
icesolid
Forum Regular
Posts: 502
Joined: Mon May 06, 2002 9:36 pm
Location: Buffalo, NY

Re: Storing photos in DB

Post by icesolid »

Interesting, I may just stick with my current file-system structure.

It's not like it isn't working for me, I will just curious about my options.

As far as scaling goes, this server could grow to over three million photo files.
User avatar
Sofw_Arch_Dev
Forum Commoner
Posts: 60
Joined: Tue Mar 16, 2010 4:06 pm
Location: San Francisco, California, US

Re: Storing photos in DB

Post by Sofw_Arch_Dev »

Good points, and I can agree with some of them in specific context, but I would be careful to declare what is common practice since most people define that within the scope of their own experience. Sharing a volume in a clustered environment down have its own headaches, and updating mirrors of a volume to reflect changes from a master to a slew of slaves (if one were to have multiple distributed copies of a volume) can be very hairy.

I don't think anyone's arguing that databases don't store its files on a files ystem but the point about maintainability is only valid if moving files around (and why you would do that often in a production environment is yet to be defined) is a requirement.

As for performance, no, you don't want to overload your DB with too many requests. Depending on requirements any one of a number of different caching mechanisms can be worked out. And whether on a file system or in a database, the more files there are the more space will be required.

As for scalability, I'm not sure what the issue is: a) having a database span multiple servers or b) having the database be accessible to multiple servers. In the case of issue a, I don't think we're approaching that type of technical challenge here. I'm not sure why one would architect a database that spanned multiple separate boxes, and I haven't encountered anything that would require such but I don't think that argument is appropriate here. For issue b, I know it is common place to have multiple machines in a cluster all pointing to one database server. That's just web-app 101.

As you're already finding out icesolid, there will be arguments for both sides and folks who claim that one approach is better than the other. You will inevitably find both put into practice. As with any solution, in the end it's the expectation put on the different performance points and the project requirements that will drive the final solution. For a project where the images are stored more for archival purposes, perhaps a database makes more sense. You'll find that EMC/Documentum's Digital Asset Manager uses a database. For a project where the images are served on a public-facing website, a database is probably not the best approach.
Last edited by Sofw_Arch_Dev on Wed Jun 23, 2010 3:56 pm, edited 1 time in total.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Storing photos in DB

Post by Eran »

I merely mentioned it is common practice, as to say that most of everyone who encounters this dilemma chooses to use the filesystem and here are the reasons. Sometimes there's a reason most people choose a certain solution over another, so you need a very compelling case to do the opposite.

Storing the files on the filesystem is much easier to maintain, I don't see how that can be argued. If you need to resize images, package images, backup images, cache images, use a delivery network - you can use ready solutions and tools that you simply can't use when storing it in the database. You can no longer access the files directly, it's that simple.

My point about the database size is that it affects it's performance, since for good performance the working data needs to fit in memory - which is much more expensive than disk space. Storing images in the database will quickly catapult it to the point it exceeds system memory.

Regarding scalability - as long as you can serve the images from just one server, it's still easier to do it from the filesystem (via HTTP) than via the database (which requires using SQL and scripting).
No one wants to architect a system where the database is served from multiple servers - but it's simply a requirement beyond a certain point. You can only take a single server so far, either by optimization or by adding resources (which is called scaling up) before you need to use several servers behind some kind of load balancing (which is called scaling out). All the major sites today use many, many servers for their database and even more servers to serve their static files. It is much easier to serve files from multiple servers than to spread a database over multiple servers, either via replication, sharding or some middleware solution.
User avatar
Sofw_Arch_Dev
Forum Commoner
Posts: 60
Joined: Tue Mar 16, 2010 4:06 pm
Location: San Francisco, California, US

Re: Storing photos in DB

Post by Sofw_Arch_Dev »

I'm enjoying this discussion, Pytrin, and let me say that I appreciate your experience and insight. I also like your icon.

My argument about ease of maintenance is contextual. Certainly it's going to be easier to package a bunch of images in a tar file if the images are on the file system. Accessibility is clearly better facilitated with files loose on the file-system. Resizing them from a database would require saving the data to the file system first and then replacing the row in the database with the re-sized data. A couple extra steps there, no argument. But the important thing to ask is if these concerns are important in the context of the project at hand. So far all of icesolid's images are the same size. Will they always be and do we need to plan on this ability? We don't know.

Icesolid mentioned that he's talking about 4000 upload/delete/view operations per day. We don't know about any other requirements for the application, what the expected growth rate is on that number or anything else. There's no need to over-engineer a solution in anticipation of something on the order of FaceBook when we don't know if that's necessary. Loading an image from the file system is going to be a little faster than loading from a database, but how much faster. I'm sure it's on the order of milliseconds. If you're really concerned about hitting your database 4000 or even 8000 times a day, cache the images from the database on disk and serve repeat requests from the file system while the master remains in persistence. It's not a difficult task to maintain the cache even if your persistence version of an image changes. There is no need to load your entire database into memory. Serving non-cached database requests not that slow for MOST purposes. I say evaluate the requirements and the future before you over-engineer a solution.

Back to icesolid's original quandry of wanting to alleviate his headache of file paths and to potentially centralize information. Given no other mandates I don't think a DB solution is out of the question.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Storing photos in DB

Post by Eran »

* I only brought up the issue of scalability since you touched upon it in your first post as a positive for database stored images, and I refuted that.
* Fitting the database in memory is not something you control - it either fits in the space you allocated for the buffer pool, or it doesn't. If it doesn't, you don't have control over what is stored in the cache (it uses the LRU algorithm), so since images take up the most space it stands to reason that other queries will suffer for it.
* Regarding simplicity, there's no question file-system wins on that account. So far I haven't seen one compelling reason to use a database to store images. I don't see where I suggested an over engineered solution - in my opinion I offered the simplest one - use the file system. Comparing file-path management to SQL and schema management - file paths are much simpler, and can be used with much less technical knowledge.

By the way, I'm glad you liked my avatar ;)
icesolid
Forum Regular
Posts: 502
Joined: Mon May 06, 2002 9:36 pm
Location: Buffalo, NY

Re: Storing photos in DB

Post by icesolid »

Interesting stuff, a lot to think about, much appreciated :D
Post Reply