Page 1 of 1

Whats wrong with having a lot of files on web server?

Posted: Fri Mar 27, 2009 3:47 am
by NaurisSkulme
I'm developing a web page that will allow users to upload files (word documents, presentations) on server and then access it. So far the system is this - save file in one directory and save path in database and when retrieving the file you get path from db and download it.

I havent encountered this problem personally but i am aware that storing a large number of documents in same place may cause major slowdowns.

There is a lot of information on internet that this problem exists. But im having a hard time finding exactly:

- what causes the problem ?
(is it too many files in one directory, is it too long file names??)
- where is the source of this couse ?
(is it OS, is it PHP - im using Zend Framework which uses PHP to develop this page)
- when does this problem becomes relevant?
(how many files do i have to have on system to withness this problem?)
- what are the solutions?
(renaming files, making subdirectories?)


Is there someone who knows theses things. If there is please answer 4 questions mentioned above (or just give me a link if its already been answered)


Thanks in advance :)

Re: Whats wrong with having a lot of files on web server?

Posted: Fri Mar 27, 2009 3:53 am
by papa
I had to remove 20000 files from a directory a couple of weeks back using windows xp, though the files were stored on Windows Server 2003 OS. I couldn't delete more than 1k at the time otherwise the system would freeze the explorer window.

Not a very good example, but at least got me thinking. Probably better to store the file in the db.

Re: Whats wrong with having a lot of files on web server?

Posted: Fri Mar 27, 2009 4:25 am
by NaurisSkulme
Ok i think i figured it out on my own.

So lets say all files are stored in one folder - lets say 34 567 files.
If you are looking for file that was saved last it is good chance that OS will go throuhg all 34 566 files till it finds the file you want and this will take a loong time.

Now my solution would be this:

i store my 30 000th file in a path:
all_files/30000/4000/500/60/6.doc
This way if will first look for directory 30000 and all files that are in directories 20000, 10000 and 00000 will be ignored and so forth.

Only problem is - i dont know if this is how the search alghoritm works this way. But it should dont you think? :)

Re: Whats wrong with having a lot of files on web server?

Posted: Fri Mar 27, 2009 4:38 am
by papa
I would probably go for a directory per user and perhaps a directory per year or per month instead depending on how much files you expect to be uploaded.

Re: Whats wrong with having a lot of files on web server?

Posted: Fri Mar 27, 2009 5:48 am
by php_east
normally the files details are stored inside mysql, and all algorithm for searches done with mysql/php. only use the file system to do uploads and downloads. this way you can do millions of files.

Re: Whats wrong with having a lot of files on web server?

Posted: Fri Mar 27, 2009 5:53 am
by NaurisSkulme
so as long as i have path of file and know exactly where it is i dont need to make all these subdirectories - even if there is 100 000 documents i can put then in one folder?

is this true? everyone agrees? :roll:

Re: Whats wrong with having a lot of files on web server?

Posted: Fri Mar 27, 2009 7:24 am
by papa
I would still use directories in some way.

Re: Whats wrong with having a lot of files on web server?

Posted: Fri Mar 27, 2009 10:07 am
by Christopher
You didn't say which OS you are using. For example, Linux has many different filesystem types that can be used for different situations. Directory lookups are very fast and often hashed and cached.

And, of course, the big question is -- are you actually having a performance problem, or just think you might in the future? If you don't really have a problem then implementing a non-fix may actually introduce a performance problem.

Re: Whats wrong with having a lot of files on web server?

Posted: Fri Mar 27, 2009 10:43 am
by josh
I'd just create a directory every 5k files, using modulus / floor() on the id to find the folder name. Another problem you will have if you don't is if you archive a monolithic flat filesystem like that and try to extract a single file you'll find the un-archiving programs might not be as efficient as the original filesystem. It's easy enough to move the files around later on if performance actually becomes an issue like arborint said, but also sometimes if you're doing client work you have to put reasonable measures in place if the client is not prepared to contract you to maintain the system after its launch