Matrix inversion tool

Mon Aug 10 22:56:05 IDT 2015

On 10/08/15 20:56, Omer Zak wrote:
> All those discussions about inverting matrices over Z2 make me curious
> to know what kind of problems can be solved by inverting such matrices.
>
> I suppose that the actual problem, with which Shachar is struggling, is
> proprietary information. However, is it possible to indicate the kind of
> problems which can be attacked by inverting a matrix over Z2?
I think I'm not letting on too much by giving the broad area.

When calculating RAID-6 parity, one of the common methods is called
"erasure coding". You have n data blocks in a stripe which you need to
write to n+p actual disks, such that for any number of failures up to p,
you can recover the original n data blocks.

One of the ways to do that is by employing a family of algorithms called
"Erasure codes", the most famous of which is Reed Solomon. The basic
idea is that you write a matrix which is n over n+p, and you multiply
the n data blocks by that matrix, resulting in n+p data blocks, which
you then write to your disks. When p disks failed, you erase from your
matrix the lines that correspond to the disks that failed, and invert
the resulting n*n matrix. Multiplying the n disks you do have by that
matrix result in the origin n data blocks.

The obvious first requirement from your erasure code matrix, therefor,
is that it be inversible after deleting any combination of p lines from
it. This is the property I'm trying to verify in my project at hand.

The usual erasure code matrix is calculated over "ze ozer finite field",
i.e. Galois field. Frankly, I don't feel I understand Galois field, and
in particular, the way it is used in this particular case, enough to
explain why. One of the properties that make it attractive, however, is
that addition over GF is XOR. As a result, almost all RAID erasure code
matrices have their first n+1 lines look like this:

1 0 0 0 ....
0 1 0 0 ....
0 0 1 0 ....
0 0 0 1 ....
........1 0 0
........0 1 0
........0 0 1
1 1 1 ... 1 1

The practical upshot of the above is that the first n disks receive the
same data they would have received without applying erasure coding (i.e.
- if they are available, they can be read directly without going through
the matrix, and more importantly, without reading the rest of the disks
in the stripe). Furthermore, the first protection disk contains the XOR
of all data disks in the stripe. In other words, if we lost only one
disk, we can employ RAID-5 as usual.

A side effect of the above is that the matrix is extremely sparse.
Almost all lines (all but the protection disks) contain just one
non-zero number, already on the diagonal.

Using Z2 instead of GF gives certain performance benefits. It does have
the cost of making the matrix much much much larger (hence the somewhat
insane numbers quoted in my original mail. I am not really building a
stripe of 272+12). Since the matrix is still extremely sparse, however,
this does not cost the full n^3 quoted by Matan.

Look up "evenodd" for an example of an algorithm that is based on XOR
rather than GF. It is not phrased in matrix form, but that is just a
convenience.

Hope this gives enough of an insight to quelch your curiosity :-)

Shachar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20150810/a64a7250/attachment-0001.html>