I'm coding a file sync script, it will basically sync the main source of files with multiple other drives, servers, or folders. To check if files are modified I have 3 options
use filesize/last modified
md5_file/sha1_file/other hash or checksum
compare the files bit by bit against eachother
sha1 is out of the question, it is too slow. It takes way too long for a large number of files, md5 is slightly faster however.
bit by bit is definently out of the question.
file size/last modified = fast enough, but not reliable. It is possible to have 2 different files with the same modified date or file size.
Does anyone know a faster way to get a checksum of a file, or a fast reliable method of comparing files?
I will need to sync a large collection of files as quickly as possible. (about 20-50 gigs or more)
Thanks.
Check multiple files against eachother
Moderator: General Moderators
meaby want to throw a look at http://www.nongnu.org/duplicity/