[ale] 9.10 smart errors

krwatson at cc.gatech.edu krwatson at cc.gatech.edu
Mon Nov 2 08:39:13 EST 2009


> -----Original Message-----
> From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of adam
> Sent: Friday, October 30, 2009 20:50
> To: Atlanta Linux Enthusiasts - Yes! We run Linux!
> Subject: Re: [ale] 9.10 smart errors
> 
> J. D. wrote:
> > I did some digging on the smart errors and the bug reports seemed to
> > indicate that there are bad sectors on my hitachi hdd. Not sure yet if
> > this is a software or hardware issue.
> >
> >
> > On 10/29/09, Jon Reagan <jreagan90 at gmail.com> wrote:
> >> Good to hear someone else had the same issue come up with the disc
> >> checker... I was kinda worried there for a bit. ;)
> >>
> >> Jon
> >>
> >> On Thu, Oct 29, 2009 at 9:15 PM, J. D. <jdonline at gmail.com> wrote:
> >>> just finished the upgrade a few minutes ago. It notified me there are
> bad
> >>> sectors out the wazoo on sda. :( Neat feature though.
> >>> J. D.
> >>> _______________________________________________


> 
> It's plaimpsest, the new disk utility for 9.10. It'll read SMART data
> off your drive and report it to you.
> 
> So far, palimpsest has reported all my SMART errors to me correctly.
> 
> The problem I have is the SMART implementation for my WD hard drives.
> I've had a 250GB drive for over 2 years now, and it started reporting
> imminent failure about 3 months after I got it. It has yet to fail.
> 
> At least for me, the problem is SMART...plaimpsest just chooses to
> remind me of that. If you bring of detailed info on a failure, there's a
> checky box to not report SMART info for that particular HDD.
> 
> Adam


SMART may not be as smart as everyone thinks.

Failure Trends in a Large Disk Drive Population
Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz André Barroso
http://labs.google.com/papers/disk_failures.html

Abstract

It is estimated that over 90% of all new information produced in the world is being stored on magnetic media, most of it on hard disk drives. Despite their importance, there is relatively little published work on the failure patterns of disk drives, and the key factors that affect their lifetime. Most available data are either based on extrapolation from accelerated aging experiments or from relatively modest sized field studies. Moreover, larger population studies rarely have the infrastructure in place to collect health signals from components in operation, which is critical information for detailed failure analysis.

We present data collected from detailed observations of a large disk drive population in a production Internet services deployment. The population observed is many times larger than that of previous studies. In addition to presenting failure statistics, we analyze the correlation between failures and several parameters generally believed to impact longevity.

Our analysis identifies several parameters from the drive's self monitoring facility (SMART) that correlate highly with failures. Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures. Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported.

Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST 2007),
San Jose, CA, February 2007

Download: PDF Version
http://labs.google.com/papers/disk_failures.pdf


keith


-- 

Keith R. Watson                        Georgia Institute of Technology
Systems Support Specialist IV          College of Computing
keith.watson at cc.gatech.edu             801 Atlantic Drive NW
(404) 385-7401                         Atlanta, GA  30332-0280



More information about the Ale mailing list