[ale] Ugh! Kernel panic when loading ext3 modules in RHAS2.1 w/kernel 2.4.9-e30.smp
Jonathan Glass
jonathan.glass at ibb.gatech.edu
Mon Dec 29 17:56:20 EST 2003
That's my best guess. Of course, after rebooting w/the 2.4.9-e30
kernel, and sending my last email, both boxes kernel panicked again. I
have since updated to 2.4.9-e34 and a slew of other packages (gcc,
glibc, etc). On the last kernel panic I received a kernel malloc error
(or something like that). Go figure. It appears to be running fine
now.
hehe. I turned on my remote management cards on the boxes, so if they
lock up again, I can reboot them from home. Don't know why I didn't
think of that sooner. Sheesh!
Thanks
Jonathan Glass
On Mon, 2003-12-29 at 17:43, Dow Hurst wrote:
> So, unless you can force the RAID card to release even when the slave
> has crashed, you will have cut power to the slave completely, right?
> The last operation of the slave locks the RAID card in a write mode
> which won't let go? Just wondering what you've found. I am not
> familiar with RAID cards but am looking thru the O'Reilly book on Linux
> RAID.
> Dow
>
>
> Jonathan Glass wrote:
>
> >On Mon, 2003-12-29 at 16:42, Dow Hurst wrote:
> >
> >
> >>That is definitely not HA, for sure. You'll have to roll back, I
> >>guess? Have you tried using Reisferfs or XFS? Just curious,
> >>Dow
> >>
> >>
> >>Jonathan Glass wrote:
> >>222
> >>
> >>
> >>>Great! I updated my HA-cluster master-node to the latest kernel
> >>>(2.4.9-e30.smp) available from RHN, and as soon as it mounts the ext3
> >>>file systems, it kernel panics. This is all the error message I have.
> >>>
> >>>Message from syslogd at biolab2 at Mon Dec 29 15:20:52 2003 ...
> >>>biolab2 kernel: Assertion failure in unmap_underlying_metadata() at
> >>>buffer.c:1542: "!buffer_jlist_eq(old_bh, 3)"
> >>>
> >>>Sometimes I hate servers!
> >>>
> >>>Jonathan Glass
> >>>
> >>>
> >
> >It appears that the slave node went down ungracefully, and locked up the
> >shared SCSI storage. When the master node came back online, and tried
> >to start the "cluster" service, it couldn't get full access to the
> >partition b/c the slave's RAID card had control. I killed the slave
> >machine, rebooted the master, and everything cleared up. Weird.
> >
> >I was afraid it was related to my trying to configure NFS & NIS to use
> >ports 5000-5004, but I don't think so now. Time to dig through the log
> >files to see when this happened, and why.
> >
> >Thanks for the tips.
> >
> >
--
Jonathan Glass
Systems Support Specialist II
Institute for Bioengineering & Bioscience
Georgia Institute of Technology
Email: jonathan.glass at ibb.gatech.edu
Office: 404-385-0127
Fax: 404-894-2291
More information about the Ale
mailing list