[ale] Debugging lockups on a linux system?
Mike Harrison
meuon at geeklabs.com
Mon Nov 28 08:46:12 EST 2005
On Mon, 28 Nov 2005, William Fragakis wrote:
> You may want to make sure it's not hardware related ie dying/funky
> power supply, bad RAM, dying/dead case fans, etc.
> On Nov 28, 2005, at 7:33 AM, tom sawyer wrote:
> > I'm having a problem.? I have a couple of linux boxes that I support
> > at a client site.? The problem is that one of them keeps locking up
> > for no apparent reason.? All I have is SSH access.? I have to have
> > them reboot the device so that I can SSH back into it.? Nothing shows
> > up in /var/log/messages when it locks up.? The next thing I see is the
I agree with William, the only time I've seen a Linux boxen die hard
is hardware/power supply/ram/cpu problems. With the exception of
off things happening to I/O or a bad SWAP file/partition when the machine
hits swap.
Which leads to: Next time it locks up, note the exact time..
slowly and carefully unplug all the IO's. about a minute apart.
Keyboard/A20, Ethernet.. etc.. Then give the machine a few minutes
to free cycles up, log.. before you do a power off recycle.
Then check the log again. If there are entries after you started
unplugging things, it's a clue that something is I/O bound,
like a DDOS against the ethernet or a fubared web-app or a bad keyboard.
Sometimes a *nix machine will CRAWL under load.. but will free up
when you remove the load.
If not... hardware. And if it's a RH 7.1 machine, it may be time
for a complete upgrade anyway...
More information about the Ale
mailing list