<p dir="ltr">You came late to the party, but you're the only one who brought cheque!</p>
<p dir="ltr">Thanks, it's exactly what I was looking for.</p>
<div class="gmail_quote">On May 28, 2013 4:22 PM, "Ori Berger" <<a href="mailto:linux-il@orib.net">linux-il@orib.net</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 05/08/2013 09:22 PM, Elazar Leibovich wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
I have a software product being built a few times a day (continuous<br>
integration style). The end product is an installable tar.gz with many<br>
java jars.<br>
<br>
Since the content of the tar.gz's is mostly the same, I want to use a<br>
filesystem that would dedupe the duplicated content.<br>
<br>
As I see it, it's s FUSE filesystem that:<br>
<br>
</blockquote>
.<br>
.snip<br>
.<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Is there anything like that available?<br>
Is there a smarter solution?<br>
</blockquote>
.<br>
<br>
Apologies for being late to the party.<br>
<br>
The tar.gz makes everything a problem - a zip would work better for what you want (because, unlike a .tar.gz, it will not compress across files - each one will compress individually).<br>
<br>
However, there is an (essentially) ready made solution which will work with .zips, but much much much better with the original folders: bup<br>
<br>
<a href="https://github.com/bup/bup" target="_blank">https://github.com/bup/bup</a><br>
<br>
As long as you don't care about ownership/permissions/<u></u>modification-time (there's a branch that has those as well, but IIRC it's not in the main branch yet), bup:<br>
<br>
a) dedups at the sub-file level (that is, if you add/delete/change 1 byte in a 100GB file, the additional version will take ~10KB on average). bup breaks file into "easy to find again" sections, and actually stores those sections. A change of one byte will likely change just one such section, which has expected size of ~8KB<br>
<br>
b) gzips each such section individually (so it won't be much larger than a .tar.gz except for pathological cases)<br>
<br>
c) is randomly accessible - any version, any time<br>
<br>
d) comes with a command line front end, an FTP front end, a FUSE front end, and possibly more I forgot.<br>
<br>
e) uses git as a storage format. If all else fails, you can poke at the internals using git.<br>
<br>
f) has a "manual mode" (bup split / bup join), in which you supply your own file through stdin, and bup still does its own dedup magic. You'd still want to use .tar (best) or .zip (2nd best) rather than .tar.gz, of course.<br>
<br>
bup is the best thing for backup since sliced bread. It's also reasonably fast, works locally or client/server through ssh, and more. The only thing I'm really missing is built-in encryption, and some people who care more about perms and ctime/mtime/atime in backups miss those - but otherwise, it is teh awesome.<br>
</blockquote></div>