faster rsync of huge directories
Shachar Shemesh
shachar at shemesh.biz
Mon Apr 12 11:56:13 IDT 2010
Nadav Har'El wrote:
> On Mon, Apr 12, 2010, Shachar Shemesh wrote about "Re: faster rsync of huge directories":
>
>> Upgrade both ends to rsync version 3 or later. That version starts the
>> transfer even before the file list is completely built.
>>
>
> Maybe I'm missing something, but how does this help?
>
> It may find the first file to copy a little quicker, but finishing the
> rsync will take exactly the same time, won't it?
>
Not at all. If the two are done linearly, then only after the entire
directory tree is scanned will the first transfer *begin*. The total
transfer time will be tree scan time + transfer time for older rsyncs,
but the two overlap for newer transfers. How much time exactly that
would save really depends on how much the second time is (i.e. - how
much data you need to actually transfer).
> Also, if nothing has changed, it will take it exactly the same time to
> figure this out, won't it?
>
Yes. You might still save some time, but this, definitely, is the
minimal advantage that newer rsyncs have over older ones.
> I'm not sure what his problem is, though. Is it the fact that the remote
> rsync takes a very long time to walk the huge directory tree, or the fact
> that sending the whole list over the network is slow?
>
From my experience, it's mostly the former.
> If it's the first problem, then maybe switching to a different filesystem,
>
At the time, we tested ext3, jfs and xfs, and found no significant
differences between them. It was not, however, a scientific test.
> or reorganizing your directory structure (e.g., not to have more than a few
> hundred files per directory) will help.
>
That is likely to actually help (<plug>and is why rsyncrypto has the
--ne-nesting option when encrypting file names</plug>), but is not
always a viable option.
> If it's the second problem, then maybe rsync improvements are due - i.e., to
> use rsync's delta protocol not only on the individual files, but also on the
> file list.
>
It's not the second, typically.
Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20100412/b7e12a48/attachment.html>
More information about the Linux-il
mailing list