[ale] Failed drives on SuperMicro server backplane

Jeff Hubbs jhubbslist at att.net
Thu Oct 22 11:37:06 EDT 2009


I've had two of eight SATA drives on a 3ware 9550 card fail due to a 
protracted overtemp condition (SPOF HVAC).

The eight drives are arranged in kernel RAID1 pairs and the four pairs 
are then kernel RAID0ed (yes, it flies).  The two failed drives are in 
different pairs (thank goodness) so the array stayed up.  I've used 
mdadm --fail and mdadm --remove to properly mark and take out the bad 
drives and I've replaced them with on-hand spares. 

The problem is that even with the new drives in, I don't have a usable 
sde or sdk anymore.  For instance:

   # fdisk /dev/sde
   Unable to read /dev/sde

[note:  I've plugged spare drives into another machine and they fdisk 
there just fine]

In my critical log I've got "raid1: Disk failure on sde, disabling 
device" and another such message for sdk...is there a way I can 
re-enable them w/o a reboot?

Two related questions:
This array is in a SuperMicro server with a 24-drive backplane in the 
front.  When the two SATA drives failed, there was no LED indication 
anywhere.  In looking at the backplane manual, there are six I2C 
connectors that are unused, and I only have the defaults for I2C support 
in the kernel.  The manual also says that the backplane can use I2C or 
SGPIO.  Is there a way I can get red-LED-on-drive-failure function (red 
LEDs come on briefly on the whole backplane at power-on)?

I've set this array and one other 14-drive on on this machine up using 
whole disks - i.e., /dev/sde instead of /dev/sde1 of type fd.  How 
good/bad is that idea?  One consideration is that I'm wanting to be able 
to move the arrays to another similar machine in case of a whole-system 
failure and have the arrays just come up; so far, that has worked fine 
in tests.




More information about the Ale mailing list