remote directory/partition

remote directory/partition

Amos Shapira amos.shapira at gmail.com
Mon Oct 24 01:51:10 IST 2011


On 23 October 2011 18:27, Yedidyah Bar-David <linux-il at didi.bardavid.org> wrote:
> I used drbd on a LAN, and know that it can theoretically work rather well
> on larger distance when used as read-write on one side only. They also
> have a pay-for tool to do this asyncronously called drbd proxy. This
> implies using a local copy and have drbd sync it. You can choose between
> three what they call "Protocols" to affect the perceived local latency.

DRBD is indeed a very good tool for block device replication once you
learn your ways around it. I'm saying this from extensive personal
experience (we used to use it heavily before we moved to "real"
hardware SAN servers (EMC and now HDS)).

DRBD does not care about the file system. It'll support multi-node
read/write if you tell it to or force strict "one node can write" if
you tell it to. If you want to use it with read/write on more than one
side then you MUST use a Cluster-Aware Filesyste, (e.g. RedHat GFS or
Oracle OCFS2, note that these are different from "Distributed
Filesystems" below in that they do not replicate data (e.g. the data
could be on a shared disk like a SAN, which is what DRBD can be viewed
as)). We got GFS up for a test but performance on top of DRBD sucked.
I heard that people do use them so maybe in a different configuration
they are usable.  These are filesystems which read/write to a regular
device for all they care but they are AWARE that someone else might be
manipulating the same disk blocks at the same time and need extra
mechanisms to coordinate the changes.

About Didi's "protocol" remark - these are not really different
protocols (even through they call them "protcol a/b/c") but actually a
way for you to decide when would a writer consider the block to be
replicated - whether being ack'ed that it was received by the remote
node is enough, or when the remote node put it on its disk write queue
or when the remote's disk has ack'ed that it's physically written to
the platter (you can skip that one if the disk has Battery-Backup-Up
(BBU) write cache).

The main limitation of DRBD is that it allows for only two nodes to
sync between themselves (each side of the sync can be handled by a
cluster of servers for HA, but still only one logical host at each
end). The commercial add-on allows for a third node to listen to the
traffic, potentially over WAN to a remote site, and replace one of the
sides if it goes down. This is meant to be used for DC (Disaster
Recovery) sites and off-site backup.

DRBD also slows down the writes to local disk. Our measurements put
this cost at ~10% so it's usually not an issue. This is another factor
in deciding between "protocol" a/b/c.

There are a few "distributed filesystems" floating around (note these
are different from "Cluster Aware FS", they replicate the data at the
FS level). Here is a list from Wikipedia:
http://en.wikipedia.org/wiki/Distributed_file_system (BTW, as far as I
know, HDFS shouldn't be considered a general-purpose file system, even
it would have been cool to use it for that :). Ceph support is part of
the vanilla Linux kernel. AFS
(http://en.wikipedia.org/wiki/Andrew_File_System) is the one I used to
hear about a lot at the time (long time ago), I think it's successor
is CODA, but I've never got a good enough excuse to try it.

It'll be interesting to hear what you ended up using and how.

Cheers,

--Amos



More information about the Linux-il mailing list