How do the "big boys" do it? --images

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
groc426
Forum Newbie
Posts: 16
Joined: Tue Oct 28, 2008 4:44 pm

How do the "big boys" do it? --images

Post by groc426 »

I'm curious how large retail websites like amazon.com, walmart, and so on store and call their images. I know the debate of storing files in directories or databases is as old as php itself, but what is the general process most large online retailers use? I'm not even sure they use php :) Any information or links will be greatly appreciated. Thanks!
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: How do the "big boys" do it? --images

Post by tr0gd0rr »

I worked at a place that had databases that replicate between cities. We used database storage since our infrastructure for database replication was much better than our infrastructure for file replication.

Facebook, for example, uses a very complicated system of file-based storage.

My guess is that most really big web sites use file-based CDN storage because tables with hundreds of millions of blob records are slow.

The considerations for any system include number and size of images, data center structure (SANs, databases, replication, etc.) and frequency with which images are accessed.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: How do the "big boys" do it? --images

Post by josh »

Sometimes you want to limit the # of images in a single path, you can store images 1 - 999 in folder 1/, 1,000 - 1,199 in folder 2/, etc...
And generally you will want to cache or render ahead of time multiple sizes, so at upload time, image 3153 would generate

3/100x100/3153.jpg
3/500x500/3153.jpg
etc...

This way you can scale to millions of files without slowing down too much. Only a thousand files per folder, and all sizes generated ahead of time. Any solution will depend on lots of factors though, if you are uploading 500GB of content a day like facebook you might not have the disk space for this type of approach.
jason
Site Admin
Posts: 1767
Joined: Thu Apr 18, 2002 3:14 pm
Location: Montreal, CA
Contact:

Re: How do the "big boys" do it? --images

Post by jason »

Image Name: adsfjk40jvn20450ujfv.jpg

Obviously, the name is randomized and unique.

Then, you store it based on the image name.

images/a/ad/dsf/jk4/adsfjk40jvn20450ujfv.jpg

Note that we are using the first 7 letters of the file name in a particular pattern to generate the directory structure. In this case, I reuse a few parts of the file name. You can also do this, using the same method but different pattern.

images/ad/sf/jk/40/jv/n2/04/adsfjk40jvn20450ujfv.jpg

This just uses the characters from the beginning of the file name in pairs of 2. Again, with just the file name, you can quickly and easily find the file on the server.

Basically, you are just building up a tree of directories based on the file name. This also means you only need to store the file name, as you can always find out where it is in the directory structure. You can limit your directory names to 2 characters if you want rather than 3.

The database need only store the file name, as mentioned. As long as you keep a common directory structure that is based on the name (which, most likely, will be generated randomly), you can do all sorts of fun things with the storage without having to keep track of where the image is located.

Hope this helps.
Post Reply