[ale] ZFS on Linux
Michael B. Trausch
mbt at naunetcorp.com
Mon Apr 1 16:26:32 EDT 2013
On 04/01/2013 03:30 PM, Jim Kinney wrote:
> yeah. that looks fun. So snapshots are a double-edged sword.
This is true for LVM, too, though LVM operates on a block basis, and not
a filesystem basis. If you have a snapshot allocated, and it fills up
(all storage within it is allocated) it becomes inactive. This is even
worse: you cannot even read it, then, because the source will continue
to function, so the snapshot simply becomes lost.
Here is an example of what happens:
[mbt at aloe ~]$ sudo lvcreate -s --name lv_swap_snap01 --size 100M
/dev/vg_aloe/lv_swap
Rounding up size to full physical extent 128.00 MiB
Logical volume "lv_swap_snap01" created
[mbt at aloe ~]$ sudo lvdisplay /dev/vg_aloe/lv_swap{,_snap01}
--- Logical volume ---
LV Path /dev/vg_aloe/lv_swap
LV Name lv_swap
VG Name vg_aloe
LV UUID DuqvpX-s6zJ-3QW1-hbIp-Dmvd-PW0z-1faz0v
LV Write Access read/write
LV Creation host, time aloe.naunetcorp.net, 2013-04-01 16:01:05 -0400
LV snapshot status source of
lv_swap_snap01 [active]
LV Status available
# open 0
LV Size 4.00 GiB
Current LE 128
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
--- Logical volume ---
LV Path /dev/vg_aloe/lv_swap_snap01
LV Name lv_swap_snap01
VG Name vg_aloe
LV UUID Kn7xJn-2vIh-moWm-DkVF-kYgr-m72U-TRlXdf
LV Write Access read/write
LV Creation host, time aloe.naunetcorp.net, 2013-04-01 16:03:09 -0400
LV snapshot status active destination for lv_swap
LV Status available
# open 0
LV Size 4.00 GiB
Current LE 128
COW-table size 128.00 MiB
COW-table LE 4
Allocated to snapshot 0.00%
Snapshot chunk size 4.00 KiB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:3
Then I wrote zeros:
[mbt at aloe ~]$ sudo dd if=/dev/zero of=/dev/vg_aloe/lv_swap
dd: writing to ‘/dev/vg_aloe/lv_swap’: No space left on device
8388609+0 records in
8388608+0 records out
4294967296 bytes (4.3 GB) copied, 186.521 s, 23.0 MB/s
Now, after writing a whole bunch of zeros to the swap volume:
[mbt at aloe ~]$ sudo lvdisplay /dev/vg_aloe/lv_swap{,_snap01}
/dev/vg_aloe/lv_swap_snap01: read failed after 0 of 4096 at
4294901760: Input/output error
/dev/vg_aloe/lv_swap_snap01: read failed after 0 of 4096 at
4294959104: Input/output error
/dev/vg_aloe/lv_swap_snap01: read failed after 0 of 4096 at 0:
Input/output error
/dev/vg_aloe/lv_swap_snap01: read failed after 0 of 4096 at 4096:
Input/output error
--- Logical volume ---
LV Path /dev/vg_aloe/lv_swap
LV Name lv_swap
VG Name vg_aloe
LV UUID DuqvpX-s6zJ-3QW1-hbIp-Dmvd-PW0z-1faz0v
LV Write Access read/write
LV Creation host, time aloe.naunetcorp.net, 2013-04-01 16:01:05 -0400
LV snapshot status source of
lv_swap_snap01 [INACTIVE]
LV Status available
# open 1
LV Size 4.00 GiB
Current LE 128
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
--- Logical volume ---
LV Path /dev/vg_aloe/lv_swap_snap01
LV Name lv_swap_snap01
VG Name vg_aloe
LV UUID Kn7xJn-2vIh-moWm-DkVF-kYgr-m72U-TRlXdf
LV Write Access read/write
LV Creation host, time aloe.naunetcorp.net, 2013-04-01 16:03:09 -0400
LV snapshot status INACTIVE destination for lv_swap
LV Status available
# open 0
LV Size 4.00 GiB
Current LE 128
COW-table size 128.00 MiB
COW-table LE 4
Snapshot chunk size 4.00 KiB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:3
The dd process finished, but the snapshot is useless:
[mbt at aloe ~]$ sudo xxd /dev/vg_aloe/lv_swap|head -n 5
0000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[mbt at aloe ~]$ sudo xxd /dev/vg_aloe/lv_swap_snap01|head -n 5
xxd: Input/output error
The behavior of btrfs and zfs is, therefore (at least in my opinion) far
more sane. LVM's behavior means that the system won't grind to a halt,
but btrfs and zfs's behavior means that the system won't lose data.
I'll take data storage robustness over uptime for most applications any
day of the week.
On any system I use snapshots on, I ensure that snapshots are very
short-lived, or that they have enough space to never run out. With LVM,
the only assurance you have of that is allocating 100% of the size of
the original volume, which is somewhat unfortunate, as it is very wasteful.
It would be nice if the behavior was something that could be configured.
I'm sure that there are people out there that would prefer to see LVM's
behavior on filesystems like btrfs and ZFS.
> The data deduplication is very useful until it's backup time. It looks
> like the backup will un-deduplicate and use full-size storage less
> backup compression abilities.
Yes, in order to efficiently back those filesystems up, you pretty much
need utilities that understand the filesystem very well. I don't know
what utilities support things like that in zfs-land, but supposedly
btrfs exposes enough functionality that it is possible to do there.
I've not actually looked much further into that myself, though, as I'm
waiting for btrfs to become, well, usable. :-)
--- Mike
--
Michael B. Trausch, President
Naunet Corporation
Telephone: (678) 287-0693 x130
Toll-free: (888) 494-5810 x130
FAX: (678) 287-0693
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 901 bytes
Desc: OpenPGP digital signature
URL: <http://mail.ale.org/pipermail/ale/attachments/20130401/d1c17d92/attachment.sig>
More information about the Ale
mailing list