How do the "big boys" do it? --images
Moderator: General Moderators
How do the "big boys" do it? --images
I'm curious how large retail websites like amazon.com, walmart, and so on store and call their images. I know the debate of storing files in directories or databases is as old as php itself, but what is the general process most large online retailers use? I'm not even sure they use php
Any information or links will be greatly appreciated. Thanks!
Re: How do the "big boys" do it? --images
I worked at a place that had databases that replicate between cities. We used database storage since our infrastructure for database replication was much better than our infrastructure for file replication.
Facebook, for example, uses a very complicated system of file-based storage.
My guess is that most really big web sites use file-based CDN storage because tables with hundreds of millions of blob records are slow.
The considerations for any system include number and size of images, data center structure (SANs, databases, replication, etc.) and frequency with which images are accessed.
Facebook, for example, uses a very complicated system of file-based storage.
My guess is that most really big web sites use file-based CDN storage because tables with hundreds of millions of blob records are slow.
The considerations for any system include number and size of images, data center structure (SANs, databases, replication, etc.) and frequency with which images are accessed.
Re: How do the "big boys" do it? --images
Sometimes you want to limit the # of images in a single path, you can store images 1 - 999 in folder 1/, 1,000 - 1,199 in folder 2/, etc...
And generally you will want to cache or render ahead of time multiple sizes, so at upload time, image 3153 would generate
3/100x100/3153.jpg
3/500x500/3153.jpg
etc...
This way you can scale to millions of files without slowing down too much. Only a thousand files per folder, and all sizes generated ahead of time. Any solution will depend on lots of factors though, if you are uploading 500GB of content a day like facebook you might not have the disk space for this type of approach.
And generally you will want to cache or render ahead of time multiple sizes, so at upload time, image 3153 would generate
3/100x100/3153.jpg
3/500x500/3153.jpg
etc...
This way you can scale to millions of files without slowing down too much. Only a thousand files per folder, and all sizes generated ahead of time. Any solution will depend on lots of factors though, if you are uploading 500GB of content a day like facebook you might not have the disk space for this type of approach.
Re: How do the "big boys" do it? --images
Image Name: adsfjk40jvn20450ujfv.jpg
Obviously, the name is randomized and unique.
Then, you store it based on the image name.
images/a/ad/dsf/jk4/adsfjk40jvn20450ujfv.jpg
Note that we are using the first 7 letters of the file name in a particular pattern to generate the directory structure. In this case, I reuse a few parts of the file name. You can also do this, using the same method but different pattern.
images/ad/sf/jk/40/jv/n2/04/adsfjk40jvn20450ujfv.jpg
This just uses the characters from the beginning of the file name in pairs of 2. Again, with just the file name, you can quickly and easily find the file on the server.
Basically, you are just building up a tree of directories based on the file name. This also means you only need to store the file name, as you can always find out where it is in the directory structure. You can limit your directory names to 2 characters if you want rather than 3.
The database need only store the file name, as mentioned. As long as you keep a common directory structure that is based on the name (which, most likely, will be generated randomly), you can do all sorts of fun things with the storage without having to keep track of where the image is located.
Hope this helps.
Obviously, the name is randomized and unique.
Then, you store it based on the image name.
images/a/ad/dsf/jk4/adsfjk40jvn20450ujfv.jpg
Note that we are using the first 7 letters of the file name in a particular pattern to generate the directory structure. In this case, I reuse a few parts of the file name. You can also do this, using the same method but different pattern.
images/ad/sf/jk/40/jv/n2/04/adsfjk40jvn20450ujfv.jpg
This just uses the characters from the beginning of the file name in pairs of 2. Again, with just the file name, you can quickly and easily find the file on the server.
Basically, you are just building up a tree of directories based on the file name. This also means you only need to store the file name, as you can always find out where it is in the directory structure. You can limit your directory names to 2 characters if you want rather than 3.
The database need only store the file name, as mentioned. As long as you keep a common directory structure that is based on the name (which, most likely, will be generated randomly), you can do all sorts of fun things with the storage without having to keep track of where the image is located.
Hope this helps.