[ale] read after write verify?, data scrubbing procedures

Thu Oct 25 21:12:07 EDT 2012

Phil Turmel <philip at turmel.org> wrote:

>On 10/25/2012 04:43 PM, Ron Frazier (ALE) wrote:
>> Hi all,
>> 
>> In another thread, I mentioned that I do a manual data scrubbing
>> procedure on my hard drives which involves reading inverting writing
>> reading inverting writing all sectors on the hard drive periodically
>> with Spinrite.  As an alternative, I use the Linux badblocks non
>> destructive read write mode.  This process is certainly effective in
>> reverifying the integrity of the drive barring any mechanical
>> problems or controller problems.  As also mentioned in another
>> thread, once some drives develop bad sectors, and based partly on
>> advice here, I replace them.
>
>Writing inverted data sounds dangerous to me.  What happens if that
>tool
>is interrupted?  How big a chunk is at risk at any point?
>

There is no risk unless the computer crashes, freezes, loses power, etc. during a write cycle.  This risk exists during any writes whether they're doing diagnostics or not.  The purpose of the tool is to fully test and verify the integrity and capability of the drive.

As far as I know, only one sector is in play at one time, so that's the most that should be at risk.  First it reads the data, presumably successfully.  If not, it goes into data recovery mode which I mentioned in my diagnostics - nirvana NOT thread.  The good data from the sector is save away in memory somewhere.  Then, an inverted copy of the data is written to the sector, then it is read back.  This verifies that the bits can be written and read successfully and that they all work in the opposite logical state to what they were.  Then, the original data is pulled from memory and written back to the sector.  So, every bit of every sector is read and written twice and every bit is verified to function both as a 1 and a 0.  The badblocks command does something similar in the non destructive read write mode, although I'm pretty sure the data patterns used are different. 

>> I basically want to have as much evidence as I can that the drive is
>> reliable, periodically, still understanding that spontaneous or semi
>> spontaneous failures are possible.
>> 
>> I am considering methods of reducing the maintenance procedures while
>> still maintaining high confidence in the drives.  Mike T mentioned
>> some automation and raid procedures that I may be able to implement
>> on some machines at a future date.  In the mean time, I'm considering
>> altering my procedures without giving up too much functionality.
>> 
>> My first question is, does a HDD do a read after write verify by
>> default?  So, if the application or OS sends a command to write a
>> sector, will a verify operation automatically be done after the write
>> process, and, if the write failed, will the data be automatically
>> relocated to a spare sector?  I'm assuming that this is the case, but
>> would like to verify that.
>
>No.
>
>This came up on the linux-raid list recently, where a user had
>hand-mirrored onto a used drive (IIRC), placed into mirror duty, and
>was
>surprised to discover that the writes had not been verified by the
>drive.
>
>If you do a simple streaming read vs. streaming write speed test,
>you'll
>find the write performance of typical drives is close to the read
>performance.  That clearly indicates that the drive firmware is *not*
>moving the heads back to re-read the freshly written data.  If it were,
>the write speed would be no better than half the read speed.
>
>As I understand it, only sectors that have had read errors (marked
>"pending") will be rechecked after write.  I also understand that reads
>that succeed, but require more than some threshold of error correction
>in the drive, trigger a silent rewrite attempt/relocation.
>

Thus, running the diagnostics I mentioned will force the drive to look at and flag anything it thinks is weak.

The answer you gave is NOT what I wanted to hear.  But, better to know it now than later.

What you are, in effect, saying is that, if I clone a HDD sector by sector from a source to a target, I haven't really verified anything about the target HDD and, in reality, don't know if my cloned data really exists at all.  That is very disconcerting.  Sounds like I may have to keep running my intense diagnostics on my backup drives, since that gives me more certainty that, when I store a backup on them, they will actually successfully write the data.

>> Assuming the read after write verify happens, then, if I write data
>> to every sector of a drive during a backup operation, as I mentioned
>> in the clonezilla thread, then, effectively, I have written and read
>> every sector on the target drive.  Perhaps, then, a successful backup
>> operation can take the place of my data scrubbing activity on the
>> target drive.
>> 
>> Then, there is the question of the data scrubbing on the source
>> drive.  In this case, once I've completed a backup, I will have read
>> each sector on the source drive.  Assuming there are no read errors,
>> (If there were, I have to get out the big guns.)  then, this has
>> accomplished 1/2 of what my scrubbing does, the read half.
>> 
>> So, the question arises as to what about the write test part of my
>> scrubbing operation for the source drive, which didn't get done.
>> Normally, my scrubbing operation would have written to each sector.
>> If the drive automatically detects write problems when a write is
>> attempted, perhaps the write scrubbing operation is not necessary, or
>> is less necessary.
>> 
>> Hope this makes any sense whatsoever.  Any opinions?
>
>Writing over a sector that might be "weaker" than another will hide its
>condition from the drive firmware.  When it is finally unreadable,
>you've lost it.  A spot that degrades over time but is read regularly
>will likely be caught by the ECC threshold and relocated before it is
>lost.
>
>A SMART background self-test (long) on a regular basis is the best
>medicine for solo storage, preferably with Reed-Solomon reconstruction
>files available.  (Such as those created by "par2".)

So, it sounds like the act of reading every sector for the purpose of doing a backup might be enough to scrub the source drive by making it pay attention to every sector.  So, should I still do those smart long tests?  I think a Spinrite diagnostic or badblocks diagnostic would accomplish more than the smart test would, but they both take a VERY long time to run, and I can't use the computer for any other purpose while they're running.

I guess, what's really bugging me, is that nothing we've discussed, outside of Spinrite or badblocks, actually verifies the drive's ability of the drive to write data before we actually TRUST it with data.

Now I know, for SURE, that I'm going to burn in those replacement 1 TB drives with Spinrite.

I'll have to study up on the par2 stuff.

Ron

>
>For raid arrays, add a regular "check" scrub.  Use a "repair" scrub
>only
>when a "check" scrub finds mismatches.
>
>HTH,
>
>Phil

--

Sent from my Android Acer A500 tablet with bluetooth keyboard and K-9 Mail.
Please excuse my potential brevity.

(To whom it may concern.  My email address has changed.  Replying to former
messages prior to 03/31/12 with my personal address will go to the wrong
address.  Please send all personal correspondence to the new address.)

(PS - If you email me and don't get a quick response, you might want to
call on the phone.  I get about 300 emails per day from alternate energy
mailing lists and such.  I don't always see new email messages very quickly.)

Ron Frazier
770-205-9422 (O)   Leave a message.
linuxdude AT techstarship.com