[ale] RAID 5 - OK, now it's hardware

Keith Hopkins hne at hopnet.net
Mon Apr 11 10:38:54 EDT 2005


Hi Greg,

   I see a couple of things, starting with
hdj: unknown partition table

   You're not going to have much luck syncing this hdj5 without a valid partition table on hdj.

   Next: from the mess below, we see read problems on hdj and hdl.  I'm glad my maxtors don't cause me that much trouble.  Assuming hdj is not live in your array yet, yank it and run Maxtor's PowerMax on it to see if it is really bad or not.  Either PowerMax or MaxBlast (can't remember which) should have a "Zero Fill" utility that will write zeros to the disk in such a way as to cause the driver to use it's "spare" sectors to remap any problem areas on the disk.  Both programs are freebies that can be downloaded from Maxtor's site.  This of course wipes any and all data you have on that disk.  After the zero fill, run the PowerMax diags.  It if shows up as bad after a zero fill, then replace it.  Something similar can be done with Seagate and IBM/Hitachi drives also.  I won't touch WD, so I can only assume they have equivalent technology.
   Assuming you have the array running in a Clean/Non-degraded mode, do the same for hdl.  If it is running in degraded mode, you have to drop in a spare and let it sync before you can pull hdl.

   BTW, the resets you are seeing are typically caused by "DMA Read" errors, a.k.a. media defects.  You can get a better picture of what is going on if you load up the SMART tools and use smartctl to look at the logs on the drives themselves.
   If you do not have smart turned on, on ALL your IDE drives, you should.  You should also have smartd and mdadm (monitor) running in the background to warn of impending doom (next time :)

   I'd be interested in seeing the `mdadm -D /dev/md2` from your system too.

--Keith


Gregory C. Johnsom wrote:
> Kieth,
> 
> Thanks for the reply...  Naturally, I forgot to attach the 
> dmesg/mdstat/mdadm dumps I did to my first mail -  they are in an 
> immediately subsequent mail.
> 
> I'm reading the md source, but I'm still not sure what, if anything,  to 
> do.  I don't/didn't hear a lot of activity from the box, it's been going 
> for nearly 3 days, and how long could/should 500GB take to sync anyway?  
> I can't see any indication of any kind of progresss indicator.  
> (Naturally the debugging macro that would detail this info is turned off).
> 
> I'm not getting resets at the moment, so who knows -  Most likely 
> nothing is happening  so nothing is generating errors.
> 
> <Insert several hours and a mail list bounce here>
> 
> OK,  bit the bullet and did a (IIRC) /mdadm --manage --run --force/ , 
> which seems to have cleared things up a bit, particularly in that 
> --detail now shows a progress indicator.
> 
> Waited for sync to complete & ran a jacksum against all the LVM volumes, 
> which generates the plethora of resets mentioned in tehe original 
> miail...  jacksum's output was piped to an SMB mount, so (almost) all 
> the local system's activity should have been reads.
> 
> Does anyone have any idea what to do about the following errors, or know 
> where I should look next?
> 
> =============== lspci =================
> <Command hangs>, but it's an nForce3 board with 2xPromise 100tx2 ide 
> controllers + a Marox G400 & a video capture card awaiting identification.
> 
> =============== dmesg =================
> 
> dma_timer_expiry: dma status == 0x44
> hdl: dma_timer_expiry: dma status == 0x44
> PDC202XX: Primary channel reset.
> hdj: DMA interrupt recovery
> hdj: lost interrupt
> PDC202XX: Secondary channel reset.
> hdl: DMA interrupt recovery
> hdl: lost interrupt
> hdj: dma_timer_expiry: dma status == 0x44
> hdl: dma_timer_expiry: dma status == 0x44
> PDC202XX: Primary channel reset.
> hdj: DMA interrupt recovery
> hdj: lost interrupt
> PDC202XX: Secondary channel reset.
> hdl: DMA interrupt recovery
> hdl: lost interrupt
[snip]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3383 bytes
Desc: S/MIME Cryptographic Signature




More information about the Ale mailing list