Cloud Backup
Shachar Shemesh
shachar at shemesh.biz
Fri May 17 11:04:50 IDT 2013
On 17/05/13 10:13, Ghiora Drori wrote:
>
> As to reliability: (This is effectively a contract):
No, it isn't (see below).
> https://aws.amazon.com/glacier/#highlights
> Quote: "Amazon Glacier is designed to provide average annual
> durability of 99.999999999% "
> If this is not good enough for you too bad.
>
When you see someone, anyone, saying such a thing, run. As fast and as
far as you can.
This level of assurance is called "nine nines"(henceforth 9*9). It
amounts to one thousandth of a second of downtime a year. Amazon is
talking out of their asses in offering it.
First, even if their service is 100% reliable, you will not get 9*9 of
service. You home internet connection is not that reliable. The fiber
connecting Israel to the world is not that reliable. The BGP protocol
that is meant to keep the internet alive should a link go down is not
that reliable. No matter what Amazon are doing, nine nines is not the
SLA you will be getting.
Now, you might claim that that is not Amazon's fault. THEY are providing
9*9, and it is the rest of the internet that is not reliable enough.
This claim is bullshit. They are not.
No single server can provide 9*9. Servers fail. Hard disks fail. Memory
fails. NICs fail. Network switches fail. In order to provide a 9*9 SLA,
you must be able to detect each and every one of those failures +
provide an alternative path *in less than 1 millisecond*, plus assure
that only one such failure happens in a year for every customer. It is
not impossible to build such a system, but it will not be affordable.
The very fact that Amazon is affordable means that they are not
providing 9*9, nor anything even close.
Just to give you a taste of how expensive such a system might be, take
head of the following interesting fact. I just ran a ping between two
computers connected via a crossed ethernet cable over a 1Gb/s link. The
average ping time was 0.431ms. In other words, just the round-trip time
(including kernel wakeup and related activities) between two computers
connected over a 3 meter cable is half the time you have at your
disposal to react to a downtime *per year*. At this rate, you cannot
afford to ping a second time in the hope that the machine was just
slightly busy, or that the packet was lost. If you do not get a reply
within half a millisecond, you must act. You only have half a
millisecond to set up the actual diversion.
What about further away computers? From my home, pinging a server
located at the server farm of the same ISP I'm connected to takes 17ms.
This means I cannot react to a server downtime in less time than half
that no matter what. If the server is down, it will take me no less than
8ms to even find out about it. That is, by the time I find out about the
server down, I am already violating my SLA by a factor of 8. The only
way to have redundancy is to be on the same segment and use specialized
low-latency equipment. Since the ISP's link itself might go down, and
since BGP is nowhere fast enough to recover, *the only way to provide a
9*9 service is to build a duplicate of the internet in order to do so*.
I think we can all agree that Amazon did not do that, or their service
would have been, by several orders of magnitude, more expensive than it
is. However, supposing that money was no object, would that work? The
answer is "no".
The reason the answer is no is that external factors were not taken into
account. A 9*9 SLA means that the chances of a problem are less than
1:10^11. The chances of a Reichter 8+ earthquake, tsunami, volcano
eruption or meteorite striking are all higher than that.
TLDR version:
The SLA is not a contractual question. Especially when counting nines,
it is a technological infrastructure question. Amazon is not providing
the nine nines it seems to be promising, and is therefor lying on its SLA.
> ( I do not work for Amazon)
I do not work for Amazon either. I did use to run a service that was a
(very humble) competitor to this one (in which we did not offer SLA for
service availability at all, only for the actual data). I currently work
for Akamai, for which Amazon is a competitor (though not this particular
service).
It should be clear that I do not speak on behalf of my employer. All
opinions are my own, and only my own.
Shachar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20130517/af6db22d/attachment.html>
More information about the Linux-il
mailing list