faster rsync of huge directories

faster rsync of huge directories

Tom Rosenfeld trosen at bezeqint.net
Tue Apr 13 21:58:53 IDT 2010


On Mon, Apr 12, 2010 at 5:02 PM, Nadav Har'El <nyh at math.technion.ac.il>wrote:

> On Mon, Apr 12, 2010, Tom Rosenfeld wrote about "Re: faster rsync of huge
> directories":
> > I realized that in my case I did not really need rsync since it is a
> local
> > disk to disk copy. I could have used a tar and pipe, but I like cpio:
> >
>
> Is this quicker?
>

I can't tell, because it is still running, and will be for a few days, but
at least it has started copying instead of just building an index.


> If it is, then the reason of rsync's extreme slowness which you described
> was *not* the filesystem speed. It has to be something else. Maybe rsync
> simply uses tons of memory, and starts thrashing? (but this is just a
> guess,
> I didn't look at it code). If this is the case then the
> copy-while-building-
> the-list that Shachar described might indeed be a big win.
>
> >   find $FROMDIR -depth -print |cpio -pdma  $TODIR
> >
> > By default cpio also will not overwrite files if the source is not newer.
>
> I recommend you use the "-print0" option to find instead of -print, and
> add the -0 option to cpio. These are GNU extensions to find and cpu (and
> a bunch of other commands as well) that uses nulls, instead of newlines,
> to separate the file names. This allows newline characters in filenames
> (these aren't common, but nevertheless are legal...).
>
> By the way, while "cpio -p" is indeed a good historic tool, nowadays there
> is little reason to use it, because GNU's "cp" make it easier to do almost
> everything that cpio -p did: The "-a" option to cp is recursive and copies
> links, modes, timestamps and so on, and the "-u" option will only copy if
> the
> source is newer than the destination (or the destination is missing). So,
>
>        cp -au $FROMDIR $TODIR
>
> is shorter and easier to remember than find | cpio -p. But please note I
> didn't test this command, so don't use it on your important data without
> thinking first!
>
> Thanks for the tip Nadav (and everyone else.)

While we are on the topic, I use cpio because I am also "historic" :-) In
the past I had to do similar  copies on diff versions of *NIX (even before
rsync was invented!)
and after much testing of issues of hard links, sym links, timestamps, etc I
found cpio to be the most portable tool. I guess when I get a chance I will
test 'cp -au'

Thanks,
-tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20100413/7fd4024d/attachment.html>


More information about the Linux-il mailing list