[ale] Do all new large SATA drives suck?

Fri Mar 2 16:40:54 EST 2012

Thanks for the information. 

We tend to use iozone on new systems.   Iozone has a diagnostic test
method where it writes a known file of specified size various way and
re-reads and compares.   We experienced a really strange failure on a
big Compaq server about 6 years ago where linux didn't properly handle
over 4GB of RAM, and just plain forgot about certain blocks it had
written to ram buffers but not to disk.     Iozone was great for
diagnosing that, as Compaq wanted to blame the Progress DBMS system.
Pointing out that iozone failed got us out of the blame loop.     Iozone
is used more for benchmarking, and we've found it handy to run on a new
server to see if the performance is in line with what we'd expect.   

It seems you really have to do both the low level hardware burn in as
you note below, and a OS level test like Iozone to make sure the
filesystem holds together. 

On Fri, 2012-03-02 at 14:43 -0500, Ron Frazier (ALE) wrote:

> Hi Neal,
> 
> I don't own any 1.5 TB drives, but I do have a few Seagate or Hitachi
> 1 TB drives that I bought within the last few years.  They've been
> fine.  I would recommend stress testing any new drive you get as
> follows before trusting it with data.  I've almost always bought
> Seagate, and I almost always look for the 5 year warranty.
> 
> Method A) Use a utility to write random data to the entire drive.  The
> Ultimate Boot CD has some such things.  Be careful not to erase your
> main system drive.  Do this at least 6 times.  This forces the drive
> controller to thoroughly evaluate each sector and determine if any are
> weak.  It also, more or less, forces each bit on each sector to be
> written with different values at least a few times.
> 
> Method B) Write random data to the drive once, just so it's not all
> zeros.  Then, use the SpinRite utility from Gibson Research to run a
> level 4 surface analysis 5-6 times.  The SpinRite utility will read
> each sector, invert it, write it, read it, invert it, and write it
> back.  This accomplishes the same purpose as Method A), but is more
> thorough and predictable in that every single bit is tested both as a
> zero and a one.  Using SpinRite has another advantage as outlined
> below.
> 
> After doing A) or B), use Disk Utility in Linux or similar to run a
> long SMART surface test, which is read only, I think.  This assumes
> the computer and drive allow you to access the SMART subsystem.  This
> test should pass with no errors.  There should also be no bad sectors
> reported.  If there are bad sectors, I would consider RMAing the
> drive.
> 
> After all this, partition and format the drive and use it for data.
> Now, I would run SpinRite 2-3 times / year on the drive.  This is
> important.  The SpinRite algorithm is non destructive.  You can run it
> on a drive with data on it.  This actually helps prevent errors by
> strengthening and refreshing all the magnetic domains.  So your data
> is not subject to fade over time (bit rot).  Also, it gives the
> controller another chance to review each sector both for reading and
> writing and determine if any are going bad.  By doing these
> procedures, I've kept many of my drives running more than 5 years,
> barring any mechanical problems.  Running a long SMART test instead of
> SpinRite will not accomplish the same thing.  While it will test each
> sector to see if it can be read, it will not test each sector to make
> sure every bit can be written and read with both 0 and 1.
> 
> If sectors are difficult to read, SpinRite will work as hard as
> possible to recover the data rather than just discarding the entire
> sector.  It tries to read finicky sectors up to 2000 times, as I
> recall.  Note that SpinRite works at the SECTOR level, not the file
> system level.  If the file system is screwed up, reading all the data
> on each sector won't help, because that data is corrupt.  While I have
> been known to run FSCK or CHKDISK (Windows) when having problems, it
> is probably better to first run SpinRite to make sure the sectors are
> as readable as possible from a magnetic point of view, then run FSCK
> or CHKDISK to correct any file system errors.
> 
> Drives certainly can and do fail later in life.  Sometimes, this
> exhaustive testing will expose pending problems, such as if SpinRite
> just cannot read some sectors, or if the SMART test reveals bad
> sectors.  This will give you a chance to recover the data before the
> drive totally blows up.  SpinRite has a SMART screen, but I don't put
> too much credence in that part of the program.  The reason is that
> every manufacturer does SMART differently and they don't always
> publish their design docs.  At the time he designed SpinRite, Steve
> had to reverse engineer the data on the SMART screen.  It's not always
> set in stone.  I'd trust the Linux SMART test in Disk Utility more for
> that purpose.
> 
> By the way, this advice is for magnetic drives.  Do not use on SSD's
> as you will probably accelerate the wear on the unit, and most of the
> positive benefits don't exist.  You can use it on a hybrid SSD /
> magnetic drive.
> 
> Sincerely,
> 
> Ron
> 
> 
> On 3/2/2012 11:18 AM, Neal Rhodes wrote: 
> 
> > I've gone ahead and ordered an HP core i3 system to be our next
> > Centos home/office server. 
> > 
> > It's  got a 1.5TB drive; normally on these off-lease units I'd buy
> > two brand new drives and mirror them.  Or that's what we've done
> > with the last 3 linux servers.     All of which are still
> > technically functioning since Fedora core 1. 
> > 
> > This drive is likely about a year old, so I'm thinking I'll just buy
> > a new 1.5TB drive and install Centos to mirror the primary. 
> > 
> > When I look at the crop of 1 - 1.5TB drives on TigerDirect and read
> > the reviews, they seem to be uniformly terrible - DOA,  failed after
> > 3 weeks, replacement failed after a week, etc.  Seagate seems to be
> > the worst, although WD not too far behind. 
> > 
> > Ummm, isn't one of the primary selling features of a disk drive that
> > it's not supposed to blow up and take down all your data with it?
> > Has there been a massive quality slip in the last couple years since
> > I last bought drives?    Seriously -  I can lose a power supply, a
> > motherboard, a display - you name it, and once I replace it I can
> > expect to still have the data.    Yes, I should do backups, and I
> > do, and yes, I should mirror the drives, and I do.    I should do
> > SMARTD monitoring and I do.  But isn't this like selling tires that
> > tend to shred randomly?    Isn't not blowing up catastrophically
> > with no warning beforehand a basic selling point for disk drives?
> > What's the point of mirroring if the odds are good that both drives
> > will fail completely the same week?   What's the point of SMARTD
> > monitoring if the darn drive quits without warning? 
> > 
> > Does anybody make a decent drive in that size range?     
> > 
> > I'm thinking that not even considering economy,  my old theory of
> > buying a pair of new identical drives may not be wise anymore, and
> > sticking with one drive that has lasted over a year and one new
> > drive is a better plan. 
> > 
> > Thoughts? 
> > 
> > Neal
> 
> 
> 
> 
> 
> -- 
> 
> (PS - If you email me and don't get a quick response, you might want to
> call on the phone.  I get about 300 emails per day from alternate energy
> mailing lists and such.  I don't always see new messages very quickly.)
> 
> Ron Frazier
> 
> 770-205-9422 (O)   Leave a message.
> linuxdude AT c3energy.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20120302/2282de36/attachment-0001.html