[ale] read after write verify?, data scrubbing procedures

Fri Oct 26 00:01:04 EDT 2012

On 10/25/2012 11:25 PM, mike at trausch.us wrote:
> On 10/25/2012 04:43 PM, Ron Frazier (ALE) wrote:
>> Then, there is the question of the data scrubbing on the source
>> drive.  In this case, once I've completed a backup, I will have read
>> each sector on the source drive.  Assuming there are no read errors,
>> (If there were, I have to get out the big guns.)  then, this has
>> accomplished 1/2 of what my scrubbing does, the read half.
> 
> This is only true if you always back up every single block on your 
> device.  The Linux RAID self-scrubbing process, however, will read every 
> single block on every single drive in the RAID unit, whether the block 
> is in use or not.

Not quite, as the metadata space on the member devices is not scrubbed.

> If you want to implement the same scrubbing process, but without the use 
> of RAID, you can simply have a script that runs once per week, dd'ing 
> all of the bits of your HDDs to /dev/null and if any of the dd processes 
> return with an error code, you know that it encountered a Bad Thing™ and 
> you investigate then.
> 
> You don't get the assurances that come with the RAID self-scrub, since 
> the RAID self-scrub also does other higher-level things like verify that 
> mirrors are actually mirrored and that parity is correct.  This, too, is 
> read-only: however, it detects errors ranging from "corrupted data came 
> through the cable and landed on the disk" to "there are bad sectors, 
> meaning this drive is out of remapping space or is lagging in remapping 
> and will therefore soon run out of remapping space".
> 
> You'll only get protection from the latter without RAID.

Linux software raid scrubs are not read-only.  The "check" scrub is
mostly read-only, except for read errors:  those are reconstructed from
mirrors or parity and then re-written.  The "repair" scrub rewrites all
but the first mirror of such layouts, and all parity data in raid4/5/6
layouts.

> You can *detect* the latter without RAID, but not correct for it, both 
> at the block level (for more expensive block devices) and at the 
> filesystem level.  Or you could even do so by taking md5sums of all your 
> files and when you read the files back, comparing the md5sums.  However, 
> then you have to reconcile the list of changed files with... you know, 
> it just becomes a lot easier to use RAID.  :-)

par2 is much better than md5sums, as it can reconstruct the bad spots
from the Reed-Solomon recovery files.

> Interestingly enough, the Wikipedia article has a little tutorial that 
> shows you how to create a RAID 5 device on Linux.
> 
> 	http://en.wikipedia.org/wiki/Mdadm
> 
> All you need are 3 drives, and one command, and you get a single logical 
> drive with (slightly) increased robustness.
> 
> Oh, and in a RAID, keep the members down to less than 1 TB, unless 
> things have drastically changed in HDD manufacturing, the odds of 
> catastrophic failure go way up with size, and the last I remember 
> hearing, 720-1000 GB was about the far end of "maybe" in terms of RAID 
> use.  I've had good luck with 750 GB array members (5 of them, in a RAID 
> 6) myself.

Indeed, raid5 cannot be trusted with large devices.  But raid6 *can* be
trusted.  And is very resilient to UREs if the arrays are scrubbed
regularly.

A discussion on the topic from the linux-raid archives:
http://marc.info/?l=linux-raid&m=130754284831666&w=2

Phil