[ale] lsof and a hung system

Jim Kinney jim.kinney at gmail.com
Tue Oct 20 14:41:05 EDT 2015


And that is a very useful capability. As this system is centos 5.11, I
had to go through and do a manual service stop on everything until I
was able to have nothing stuck open but the hung network and the nfs
connection.
On Tue, 2015-10-20 at 17:19 +0000, Lightner, Jeff wrote:
> I assume you're joking but just in case:
> 
> Systemd has services that can start/stop without dependence of the
> entire stack of services unlike init.   However, some of the services
> may be dependent on SOME other services running.   The beauty of this
> is with a hung system you might actually shut down most services even
> if some things like NFS are hung so that when you power cycle you're
> not pulling the legs out from under as many things as you might if
> your init based shutdown hung on the first script it tried to stop.
> 
> P.S. vim rules!
> 
> -----Original Message-----
> From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of
> DJ-Pfulio
> Sent: Tuesday, October 20, 2015 1:09 PM
> To: ale at ale.org
> Subject: Re: [ale] lsof and a hung system
> 
> But isn't systemd supposed to solve these issues?
> 
> BTW, I had to add a similar delay in the startup of a raspberry-pi
> box that got systemd with the 4.1 kernel in a debian install.
> 
> On 10/20/2015 12:25 PM, Jim Kinney wrote:
> > Yep. The 10G card driver had oopsed all over itself and wouldn't
> > keep 
> > a connection up. I initially tried to stop network, unload the
> > module, 
> > load the module, start the network but even that failed to reset
> > the 
> > card completely. I needed to add a sleep 20 before loading the
> > module 
> > again. Once the connection was actually working the system was
> > cleanly 
> > rebooted to lop off the zombies and things were happily OK.
> > On Tue, 2015-10-20 at 11:32 -0400, Ed Cashin wrote:
> > > On Mon, Oct 19, 2015 at 10:58 PM, Jim Kinney <
> > > jim.kinney at gmail.com>
> > > wrote:
> > > ... 
> > > > Other system with same nfs mounted storage is fine. Storage
> > > > server 
> > > > is connected to both number crunchers by dedicated, unswitched 
> > > > 10Gbps fiber ethernet.
> > > > > 
> > > > 
> > > You mean with direct connections?  In that case, the other number
> > > cruncher's connection could be fine, while the affected system
> > > could 
> > > not be able to do networking to the NFS server (for some as yet 
> > > undetermined reason), which could result in the behavior you
> > > describe 
> > > if the NFS mount is "hard".
> > 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
-- 
James P. Kinney III

Every time you stop a school, you will have to build a jail. What you
gain at one end you lose at the other. It's like feeding a dog on his
own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain

http://heretothereideas.blogspot.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20151020/eedb4b36/attachment.html>


More information about the Ale mailing list