<p dir="ltr">You came late to the party, but you&#39;re the only one who brought cheque!</p>

<p dir="ltr">Thanks, it&#39;s exactly what I was looking for.</p>

<div class="gmail_quote">On May 28, 2013 4:22 PM, &quot;Ori Berger&quot; &lt;<a href="mailto:linux-il@orib.net">linux-il@orib.net</a>&gt; wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On 05/08/2013 09:22 PM, Elazar Leibovich wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi,<br>

<br>

I have a software product being built a few times a day (continuous<br>

integration style). The end product is an installable tar.gz with many<br>

java jars.<br>

<br>

Since the content of the tar.gz&#39;s is mostly the same, I want to use a<br>

filesystem that would dedupe the duplicated content.<br>

<br>

As I see it, it&#39;s s FUSE filesystem that:<br>

<br>

</blockquote>

.<br>

.snip<br>

.<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Is there anything like that available?<br>

Is there a smarter solution?<br>

</blockquote>

.<br>

<br>

Apologies for being late to the party.<br>

<br>

The tar.gz makes everything a problem - a zip would work better for what you want (because, unlike a .tar.gz, it will not compress across files - each one will compress individually).<br>

<br>

However, there is an (essentially) ready made solution which will work with .zips, but much much much better with the original folders: bup<br>

<br>

<a href="https://github.com/bup/bup" target="_blank">https://github.com/bup/bup</a><br>

<br>

As long as you don&#39;t care about ownership/permissions/<u></u>modification-time (there&#39;s a branch that has those as well, but IIRC it&#39;s not in the main branch yet), bup:<br>

<br>

a) dedups at the sub-file level (that is, if you add/delete/change 1 byte in a 100GB file, the additional version will take ~10KB on average). bup breaks file into &quot;easy to find again&quot; sections, and actually stores those sections. A change of one byte will likely change just one such section, which has expected size of ~8KB<br>


<br>

b) gzips each such section individually (so it won&#39;t be much larger than a .tar.gz except for pathological cases)<br>

<br>

c) is randomly accessible - any version, any time<br>

<br>

d) comes with a command line front end, an FTP front end, a FUSE front end, and possibly more I forgot.<br>

<br>

e) uses git as a storage format. If all else fails, you can poke at the internals using git.<br>

<br>

f) has a &quot;manual mode&quot; (bup split / bup join), in which you supply your own file through stdin, and bup still does its own dedup magic. You&#39;d still want to use .tar (best) or .zip (2nd best) rather than .tar.gz, of course.<br>


<br>

bup is the best thing for backup since sliced bread. It&#39;s also reasonably fast, works locally or client/server through ssh, and more. The only thing I&#39;m really missing is built-in encryption, and some people who care more about perms and ctime/mtime/atime in backups miss those - but otherwise, it is teh awesome.<br>


</blockquote></div>