[ale] One possible explanation - Re: 9.10 smart errors

Robert Reese ale at sixit.com
Mon Nov 2 15:33:51 EST 2009


Hello J. D.,

Monday, November 2, 2009, 10:11:05 AM, you wrote:


> In my case I believe the tool reported around 1700 bad sectors. This seems
> like an alarming amount don't you think? I need to check it again and see if
> it has changed.

My experience with these new massive harddrives is that the magnetic layers' density has gotten so tight that it is easy for a sector to fail.  Furthermore, I believe these drives are coming with many such failed sectors from the factory, and that the manufacturers have significantly 'padded' the sector count to ensure the number of available useful sectors will easily reach the number of sectors required to meet the specified drive size.

That is, I think manufacturers intentionally make larger harddrives with more bad sectors and label them as smaller drives rather than to make a drive that more closely matches its physical capacity with far less bad sectors.

Think of it this way: You need a warehouse that is 14,000 square feet.  You can afford $1400 for the lease, which is paltry and laughable.  Someone comes along and offers you a 14,000 sq.ft. facility for the money - with a hitch: the building is actually 20,000'sq total, but 2,500'sq are unusable, and another 3,500'sq need to be left empty in case some areas become unusable (wiring fails, roof blows off, etc.) and the items in those failed areas need to be relocated.  You get your 14,000 usable square foot warehouse for the money... just don't ask a building inspector's opinion.

Ditto for your SMART tools. Unfortunately, most current tools don't know what is bad from the factory and what went bad.  I can only assume there are tools out there that can compare subsequent scans to a baseline scan.  I also assume there are tools that can access the existing list of bad sectors recorded in the drive's controller.

[ASIDE] POSSIBLE EXPLANATION
>From both manufacturing and business standpoints it makes sense for the manufacturer to make large, error-filled drives and label them as smaller.  This allows them to enter the market with the latest ginormous harddrive size offering without replacing their current manufacturing process.  This further allows the manufacturer additional time to increase their ability to create platters with incrementally fewer and fewer bad sectors, all while increasing the stability and longevity of the drive.  In turn, they can subsequently offer increasingly larger, more stable harddrives using essentially the same parts and manufacturing processes, introducing affordable incremental adjustments and updates to the existing processes and parts as needed.

Basically, if you get a 1TB drive, it may actually have the eventual capacity for 2TB when the manufacturer perfects the processes and parts for that drive.  Meanwhile, the manufacturer gets to offer the drive initially as 1TB, then 1.5TB, then ultimately 2TB as the drive is refined iteratively.  This also keeps the income at a "first-introduction" level for the newly created product, feeding the development cycle.  As a side benefit, this business/manufacturing model allows the manufacturer to update existing older inventory, "refreshing" it rather than discounting the price too steeply; hence the reason inventories of discontinued drive at retailers and wholesalers have been getting smaller and less frequent.

As much of a pain in the arse this problem of bad sectors seems, you want the manufacturers to continue to make and sell drives this way: Customers get the latest massive storage drive, manufacturers don't spend fortunes on R&D and manufacturing overhauls. Customers get more and more drive for less and less money and manufacturers create a prolonged premium income stream.  And, perhaps extremely importantly, manufacturing/technology/physics "brick walls" are seen far enough in advance to be addressed and without customers experiencing abnormally inflating drive prices for extended periods. (so far that has held out to be true...)

The bad news is that each drive, even from day to day, can differ sufficiently from preceding drives off the same line, making recovery more challenging than older drives.  Long ago, you could simply swap a controller board from on drive to another.  In an emergency, you may have success doing this today but the chances are success are also rapidly declining due to the current manufacturing processes.

[/ASIDE]

Nonetheless, here is what I do to accommodate the potential for excessive, rapid failure of active sectors in my non-RAID configured systems (meaning most non-server boxes): leave about 10% raw/unpartitioned, keep any storage partitions below 95% full and other partitions under 90% full.  When a drive gets close to using all of its allotted replacement sectors, I simply obtain a new drive (preferably larger) and image/ghost/duplicate the failing drive to the new one.  Then I remove the failing drive, label it, and put it with all the other outdated or failing drives I've saved over the years to server as an inexpensive 'hail mary' archive/backup if the need arises.

To put it succinctly: DON'T PANIC. :)

R~




More information about the Ale mailing list