[ale] read after write verify?, data scrubbing procedures

mike at trausch.us mike at trausch.us
Fri Oct 26 14:26:18 EDT 2012


On 10/26/2012 09:16 AM, Phil Turmel wrote:
> The catch that some people encounter is that some of the metadata space
> is wasted, and never read or written.  If a URE develops in that area,
> no amount of raid scrubbing will fix it, leaving the sysadmin scratching
> their head.

Eh, yeah, but I pull the member first and ask questions later.  The way 
that I see it, if a drive in a RAID has failed, I don't have time to 
scratch my head and find out why it failed, I have only the time to 
replace it.  The questions come later, when I dig around logs (both the 
system and the drive) and usually the answer is clear from the drive 
logs alone...

I did have it happen once that I experienced a point, for about 10 
seconds that seemed like an eternity.  I had a RAID 6 that was 
rebuilding because of a single-drive failure.  It was about 98% finished 
rebuilding the array, when another drive failed.  Oh, and this was my 
first failed disk ever in a RAID 6.  :-)

The rebuild finished, and then ANOTHER drive failed about 20 seconds 
later, as I was getting ready to shut down the system to replace failed 
drive #2.

Those 30 seconds are 30 seconds I will not forget.

Fortunately, the three drives were the last ones out of the original 
set.  They were known that they were going to fail.  But the swap-out 
schedule got held up for some reason I no longer recall and the 
drives---which were supposed to all be replaced within one year of 
deployment---had lasted about 19 months.  (They were horrible choices 
for a RAID, but they were cheap.  "Re-manufactured", "green" drives.)

In case anyone's curious, the original plan was to swap out 1 drive per 
quarter, except for the last two drives which were to be swapped out a 
month apart.  12 months was supposed to be the longest any of them were 
there...

> [trim /]
>
>> ... are inversely proportional to just how much you actually attempt to
>> protect your data from failure.  :-)  And being that I have backups in
>> place, I'm not terribly worried about that.  Drive fails?  Replace it.
>> Two drives fail?  Replace them.  Three or more drives fail?  Recover it.
>>    I get a much larger paycheck that week, then.
>
> :-)  I'm self-employed.  I get a much *smaller* paycheck when I spend
> too much time on this.

Hrm.  Bill hourly!

Flat-rate is high-risk, and I'll only do it for insane values of "flat 
rate".  Pay me $25,000 per month, and I'll become your dedicated support 
dude, no questions asked, and assign all my other work to someone else. 
  That's about the smallest flat rate I'd take.  :-)

>>> par2 is much better than md5sums, as it can reconstruct the bad spots
>>> from the Reed-Solomon recovery files.
>>
>> Interesting.  Though it looks like it wouldn't work for my applications
>> at the moment.  Something that can scale to, oh, something on the order
>> of two to four terabytes would be useful, though.  :-)
>
> I find it works very well keeping archives of ISOs intact.  The larger
> the files involved, the more convenient par2 becomes.

My take on what the Wikipedia article said implied that wasn't really 
possible.  I'm guessing that it's subtly inaccurate somehow---my 
understanding there was that Par2 was limited to being able to have 
32,768 blocks of recovery data.  That doesn't sound like it'd scale to 1 
TB or so unless the block size is 32 MB or larger.

>> I'll keep an eye on the third version of that spec, too.  Learn (about)
>> something new every day!
>>
>>> Indeed, raid5 cannot be trusted with large devices.  But raid6*can*  be
>>> trusted.  And is very resilient to UREs if the arrays are scrubbed
>>> regularly.
>>
>> Well, that depends.  The level of trust in each comes from the number of
>> drives.  For example, would you trust a bank of 24 drives to RAID 6?
>> Only if you're nuts, I suspect.
>
> For near-line light-duty high-capacity storage, I would certainly set up
> such a raid6.  Configuring 24 drives as 22 in raid6 w/ two hot spares
> would be more robust that a pair of twelve-drive raid6 arrays concatenated.
>
> Same capacity, higher unattended fault tolerance, but significantly
> lower performance.  Everything is tradeoff.
>
>> I'd use RAID 5 for a 3(2)-drive array.  I'd use RAID 6 up to probably
>> 7(5), tops.  If I needed to do anything more than that, I'd start
>> stacking RAID levels depending on the application's requirements.
>
> I don't use raid5 at all nowadays.  Triple mirror on three devices is my
> minimum setup.  Raid10,f3 or raid6 for anything larger.

I don't use RAID for almost anything: my desktop has a single drive in 
it, and I perform backups of everything that isn't under git, very 
regularly.

Everything that is under git exists in at least 3 places for the 
private/internal projects, and usually dozens for the public ones.  So I 
don't worry about those so much.

Also, we don't have hundreds of GB of data ourselves to back up: our 
whole history fits on a CD at the moment, making backup relatively 
convenient for the time being (and for the forseeable future).  In fact, 
we're looking at using M-Disc for the annual archive disks.

The only reason this client has the RAID array to begin with is because 
it'd take way too long to restore the data at every drive failure.  It 
takes about 90 minutes to provision the replacement server, but then 
loading the data onto the system takes nearly 18 hours, if you change 
all the discs on-time, in part because of inefficiencies in the restore 
process and in part because writing to a new, initially-being-scrubbed 
array is S-L-O-W.

Though, it goes down to 17 hours with a little extra code and two drives 
(because custom program can spin up drive #2 with enough lead time to 
ensure that it can begin reading from it right away, and then the user 
has a larger window to change the first disc, and the system doesn't 
block on the user unless the user is asleep...).

	--- Mike

-- 
A man who reasons deliberately, manages it better after studying Logic
than he could before, if he is sincere about it and has common sense.
                                    --- Carveth Read, “Logic”


More information about the Ale mailing list