filesystem capable of deduping tar.gz's content

filesystem capable of deduping tar.gz's content

Elazar Leibovich elazarl at gmail.com
Wed May 8 21:22:59 IDT 2013


Hi,

I have a software product being built a few times a day (continuous
integration style). The end product is an installable tar.gz with many java
jars.

Since the content of the tar.gz's is mostly the same, I want to use a
filesystem that would dedupe the duplicated content.

As I see it, it's s FUSE filesystem that:

1. When a file with .tar.gz extension stored, it untar it and store it in a
folder (keeping the file order in a list).
2. When it is read again, it will tar gz the underlying folder, and will
give the gzip'd result.
3. It will keep a list of file hashes, and would replace the file with a
symlink to another file if possible.
4. Bonus: do the same for jars. Java is linked at runtime, so if a .java
file didn't change - neither does its class.

Is there anything like that available?
Is there a smarter solution?

(It is theoretically possible to save a folder instead of a tar.gz, and
dedupe at higher level, but it's much easier to use a tar.gz, since it
plays well with existing Java software (ie, nexus/artifactory, maven etc).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20130508/e49042d3/attachment.html>


More information about the Linux-il mailing list