Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yep. The main caveat is that files update is not transactional. If rsync is stopped (crashed, disconnected) in the middle of updating a file, then what you get is a corrupted file.


When rsync needs to update a file it saves the contents to a temporary file first and then copies it over at the end, which should be an atomic operation on most filesystems. So you shouldn't end up with half updated files (unless you use the --inplace switch), but you can end up in situations where half the files in a directory are updated and half are not, which can be just as bad.


Interesting, didn't know about the temp file. It doesn't really make updates atomic, but it certainly reduces the chances of ending up with a partially updated file.


No, it DOES make a single file update atomic. What it doesn't do is make multiple updates atomic.

The way rsync works, it CANNOT end up with a partially updated file! (unless you use --inplace or --append which implies it - and it's your fault if you do)


Of course it CAN and it DOES NOT. If I flip two bits in a large file - one at the head and at the tail - then no matter how clever the algorithm is, the update cannot be atomic without proper support from the OS, because it would involve two separates Writes into the file.

On Windows there's Transactional NTFS whereby you can bind an open file to a transaction and then have either all or no changes applied at once. But that's only Vista+ and I am pretty sure rsync doesn't use it anyhow.


Flip those two bits. What rsync will do on the target system is create a copy of the file you want (with a name like .tempxasdiohkshlksdf-filename.ext) which takes most of the data from the local copy, and a few kilobytes of patches transferred. Then, when this file has been created, closed, its attributes properly set, and it is an identical copy of the file on the source system - it will rename ("move") the temporary file into the name that it should have. This move operation is what makes everything atomic;

It does cost another copy of the file on disk, but it does NOT leave the file in an inconsistent state. It is either the original file, or the new file - no in between.

You CAN avoid this behavior, by using the "--inplace" switch or the "--append" switch, which tell rsync to just modify the file in-place. However, this is NOT atomic, and NOT the default (for that exact reason).


OK, you win. I didn't realize rsync was solving for network-bound scenarios, but in retrospect it makes sense.


I've used rsync many times and never had any problem with corrupted files (though it might just be luck). Would running rsync a second time fix the corrupted files?


Yes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: