[ale] Failed drives on SuperMicro server backplane
Jeff Hubbs
jhubbslist at att.net
Thu Oct 22 11:37:06 EDT 2009
I've had two of eight SATA drives on a 3ware 9550 card fail due to a
protracted overtemp condition (SPOF HVAC).
The eight drives are arranged in kernel RAID1 pairs and the four pairs
are then kernel RAID0ed (yes, it flies). The two failed drives are in
different pairs (thank goodness) so the array stayed up. I've used
mdadm --fail and mdadm --remove to properly mark and take out the bad
drives and I've replaced them with on-hand spares.
The problem is that even with the new drives in, I don't have a usable
sde or sdk anymore. For instance:
# fdisk /dev/sde
Unable to read /dev/sde
[note: I've plugged spare drives into another machine and they fdisk
there just fine]
In my critical log I've got "raid1: Disk failure on sde, disabling
device" and another such message for sdk...is there a way I can
re-enable them w/o a reboot?
Two related questions:
This array is in a SuperMicro server with a 24-drive backplane in the
front. When the two SATA drives failed, there was no LED indication
anywhere. In looking at the backplane manual, there are six I2C
connectors that are unused, and I only have the defaults for I2C support
in the kernel. The manual also says that the backplane can use I2C or
SGPIO. Is there a way I can get red-LED-on-drive-failure function (red
LEDs come on briefly on the whole backplane at power-on)?
I've set this array and one other 14-drive on on this machine up using
whole disks - i.e., /dev/sde instead of /dev/sde1 of type fd. How
good/bad is that idea? One consideration is that I'm wanting to be able
to move the arrays to another similar machine in case of a whole-system
failure and have the arrays just come up; so far, that has worked fine
in tests.
More information about the Ale
mailing list