[ale] File server syncronization

Wed Jan 27 17:19:27 EST 2010

On Wed, Jan 27, 2010 at 4:58 PM, Jim Kinney <jim.kinney at gmail.com> wrote:
> Agreed. But any process where file synch must occur must have provisions for
> bandwidth to support the load.
>
> Realistically a better choice is to look at a distributed filesystem like
> GFS and it's cousins or block level processes like DRBD. But again it's
> going to eat up bandwidth.
>

I suspect drbd is more graceful than you think.

drbd is reasonably efficient on bandwidth as long as the files are
relatively stable.  In one of the modes it is even designed for a WAN
connection between the nodes.

In normal operation only the individual updated blocks are replicated.
 With a typical fileserver use case, rsync will also have to send at
least that minimal data set.  So the bandwidth seems the same.

OTOH, if you write the exact blocks multiple times in the rsync time
frame, then rsync will only have to update the other side to the final
write.  Whereas drbd will still forward every block update to the
other side.

Thus if the files are the more typical ones that get created and left
in place for a period of time, then drdb will likely be more efficient
with bandwidth than rsync.

But if you have files receiving multiple updates to the same blocks, I
would feel uncomfortable using rsync to make my mirror copies.

In case of a cluster failure drdb can use a dirty bitmap that tracks
which blocks are potentially out of sync.  After the cluster
disconnect / reconnect it only has to sync up those specific blocks.

rsync on the other hand has to send at least the hash of every file
from node to node every time it does a full resync.  I assume you
would do a full resync after any cluster disconnects.

Greg