[ale] ZFS on Linux

Derek Atkins warlord at MIT.EDU
Tue Apr 2 10:47:20 EDT 2013


Brian,

Brian MacLeod <nym.bnm at gmail.com> writes:

[great answers snipped]
> A vdev is a virtual device inside of a zpool configuration.  It
> consists of a number of drive units in this case.  A plain RAIDz (1
> drive redundant, similar to RAID5) can suffer 1 drive failure _PER
> VDEV_.  Not array, per vdev.  Raidz2 can sustain 2 (like RAID6), and
> raidz3 can withstand 3 (with severe losses in capacities), and  This
> is an important distinction, because your larger arrays could (and
> very well should) be comprised of multiple vdevs.
>
[zpool data snipped]
>
> The zpool "data" actually consists of 2 vdevs, called raidz2-0 and
> raidz2-1.  I didn't name these, Solaris did. Generally you don't work
> directly with vdevs. Each of these vdevs consists of 6 drives,
> identified here by SAS address (your system may and probably will
> vary).  These are raidz2 arrays.
>
> Because the vdev is in and of itself it's own array, being raidz2, the
> vdev itself can sustain 2 drive failures.  That means, in this case,
> it is POSSIBLE for the zpool "data" to actually sustain 4 drive
> failures, should each vdev sustain two failures.  Should a third unit
> fail in either side, the array is toast.

I wonder if this means you should spread your disks across multiple
controllers?  For example let's say you have three controllers in your
system, would it be better to put two drives from each array on each
controller?  That way if a single controller (or cable) goes bad you
don't lose your array.

> The "punishment" for this of course is reduced capacity -- these are
> 2Tb drives, so in this case, each vdev contributes ~8Tb, yielding
> about 16Tb usable here. If you were thinking 12 drives with raidz2
> (using a calculation similar to RAID6), you might expect 20Tb of
> space, so you can see the tradeoff for reliability.

I wouldn't consider this a punishment, per se.  Any error correction by
definition requires space.  In this configuration you have 6 drives in a
raidz2 so you're only "losing" 33% due to overhead.  IMHO that's not too
bad, and is better than taking those 6 drives and forming a raid-10 out
of them, because then you lose 50% to overhead.  So you get 8TB per vdev
instead of only 6.

> Now, in the case of expansion: you can technically swap in larger
> drives for smaller drives, but you will not get the expanded space
> unless you use partitions/slices instead of whole drives, and then use
> those partitions/slices as units in a vdev, but I would caution you
> against that as you take a significant performance hit doing so.

Are you sure about that?  I did some research and according to
http://forums.overclockers.com.au/showthread.php?t=961125 I should be
able to expand the space in the vdev once all the disks have been
upgraded.  Apparently there is a zpool feature called "autoexpand" that
lets you do that, once you've scrubbed.  (I'm not 100% sure what a scrub
does).

> You can add vdevs to a running zpool configuration.  In our case, we
> use a lot of Penguin Computing IceBreaker 4736 and 4745 boxes (36 and
> 45 drive chassis) and fill them as we go along.  You cannot, however,
> resize vdevs once configured (you can replace units).

Define "resize" here?  By "cannot resize" do you mean that if you have a
6-disk raidz2 you cannot restructure it into an 8-disk raidz2, or a
9-disk raidz3?

> There is math involved to prove the following assertions with regard
> to sizing vdevs, but most SA's have fallen on the rule to keep vdevs
> between 5-11 units, with 5 providing less storage but providing high
> IOP counts, and 11 providing higher capacities but at the cost of
> IOPs.  Beyond those numbers are severe losses.

This is probably due to the number of drives you need to hit to recover
a block of data, or something like that.  On the system I'm currently
designing (based on a NORCO 4224 case) it looks like 6-drive raidz2
vdevs would fit nicely.

> This all said:
> If you want flexibility to change drive sizes on the fly, I would
> caution you against ZFS.  If you can change the equation and be able
> to adjust number of drives, ZFS works very well.

What about rebalancing usage?  Let's say, for example, that I start with
one raidz2 vdev in the zpool.  Now a bit later I'm using 80% of that
space and want to expand my pool, so I get more drives and build a
second raidz2 vdev and add it to the zpool.  Can I get zfs to rebalance
its usage such that the first and second vdevs are each using 40%?  I'm
thinking about this for spindle and controller load balancing on data
reads.

[snip]
> Brain

Thanks!

-derek
-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available


More information about the Ale mailing list