[ale] Ugh! Kernel panic when loading ext3 modules in RHAS2.1 w/kernel 2.4.9-e30.smp

Mon Dec 29 17:56:20 EST 2003

That's my best guess.  Of course, after rebooting w/the 2.4.9-e30
kernel, and sending my last email, both boxes kernel panicked again.  I
have since updated to 2.4.9-e34 and a slew of other packages (gcc,
glibc, etc).  On the last kernel panic I received a kernel malloc error
(or something like that).  Go figure.  It appears to be running fine
now.

hehe.  I turned on my remote management cards on the boxes, so if they
lock up again, I can reboot them from home.  Don't know why I didn't
think of that sooner.  Sheesh!

Thanks

Jonathan Glass

On Mon, 2003-12-29 at 17:43, Dow Hurst wrote:
> So, unless you can force the RAID card to release even when the slave 
> has crashed, you will have cut power to the slave completely, right?  
> The last operation of the slave locks the RAID card in a write mode 
> which won't let go?  Just wondering what you've found.  I am not 
> familiar with RAID cards but am looking thru the O'Reilly book on Linux 
> RAID.
> Dow
> 
> 
> Jonathan Glass wrote:
> 
> >On Mon, 2003-12-29 at 16:42, Dow Hurst wrote:
> >  
> >
> >>That is definitely not HA, for sure.  You'll have to roll back, I 
> >>guess?  Have you tried using Reisferfs or XFS?  Just curious,
> >>Dow
> >>
> >>
> >>Jonathan Glass wrote:
> >>222
> >>    
> >>
> >>>Great!  I updated my HA-cluster master-node to the latest kernel
> >>>(2.4.9-e30.smp) available from RHN, and as soon as it mounts the ext3
> >>>file systems, it kernel panics.  This is all the error message I have.
> >>>
> >>>Message from syslogd at biolab2 at Mon Dec 29 15:20:52 2003 ...
> >>>biolab2 kernel: Assertion failure in unmap_underlying_metadata() at
> >>>buffer.c:1542: "!buffer_jlist_eq(old_bh, 3)"
> >>>
> >>>Sometimes I hate servers!
> >>>
> >>>Jonathan Glass
> >>>      
> >>>
> >
> >It appears that the slave node went down ungracefully, and locked up the
> >shared SCSI storage.  When the master node came back online, and tried
> >to start the "cluster" service, it couldn't get full access to the
> >partition b/c the slave's RAID card had control.  I killed the slave
> >machine, rebooted the master, and everything cleared up.  Weird.
> >
> >I was afraid it was related to my trying to configure NFS & NIS to use
> >ports 5000-5004, but I don't think so now.  Time to dig through the log
> >files to see when this happened, and why.
> >
> >Thanks for the tips.
> >  
> >
-- 
Jonathan Glass
Systems Support Specialist II
Institute for Bioengineering & Bioscience
Georgia Institute of Technology
Email: jonathan.glass at ibb.gatech.edu
Office: 404-385-0127
Fax: 404-894-2291