[ale] 9.10 smart errors

Jim Kinney jim.kinney at gmail.com
Mon Nov 2 09:26:29 EST 2009


On Mon, Nov 2, 2009 at 8:39 AM,  <krwatson at cc.gatech.edu> wrote:

> SMART may not be as smart as everyone thinks.
>
> Failure Trends in a Large Disk Drive Population
> Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz André Barroso
> http://labs.google.com/papers/disk_failures.html
>
> Download: PDF Version
> http://labs.google.com/papers/disk_failures.pdf
>

Predictive failure accuracy is poor (in most fields and hard drives in
particular) as the physics is just not understood well enough. And the
report is from a company who regularly bakes hard drives in daily use.
Any error in SMART is currently viewed and catastrophic which is
likely overkill. However, my anecdotal evidence is that once a drive
begins to show any errors, there is a recurrence time that begins to
accelerate with each new instance of sector failure in most cases. The
big thing to watch for with the new tools is the failed sector count.
There's a limit to how many the drive can automatically recover from.
Once that limit is reached, data loss is eminent. So I tend to keep a
drive until the count is just below the upper limit of reserved
blocks.

-- 
-- 
James P. Kinney III
Actively in pursuit of Life, Liberty and Happiness



More information about the Ale mailing list