[ale] lsof and a hung system

Jim Kinney jim.kinney at gmail.com
Tue Oct 20 14:39:01 EDT 2015


On Tue, 2015-10-20 at 13:09 -0400, DJ-Pfulio wrote:
> But isn't systemd supposed to solve these issues?
Not really. The network was live but not able to connect. If the
connection was in a bond, then a different set of notifications would
have appeared. Currently, systemd doesn't check outside the system for
functionality like a nagios test would. Give it another couple of
weeks.. :-)
Hmm. Should be pretty easy to hack a ping test to the next hop for each
interface to verify connectivity and have a fail signal a full
interface reset.
> BTW, I had to add a similar delay in the startup of a raspberry-pi box
> that got systemd with the 4.1 kernel in a debian install.
> 
> On 10/20/2015 12:25 PM, Jim Kinney wrote:
> 
> > 
> > Yep. The 10G card driver had oopsed all over itself and wouldn't keep a
> > connection up. I initially tried to stop network, unload the module,
> > load the module, start the network but even that failed to reset the
> > card completely. I needed to add a sleep 20 before loading the module
> > again. Once the connection was actually working the system was cleanly
> > rebooted to lop off the zombies and things were happily OK.
> > On Tue, 2015-10-20 at 11:32 -0400, Ed Cashin wrote:
> > 
> > > 
> > > On Mon, Oct 19, 2015 at 10:58 PM, Jim Kinney <jim.kinney at gmail.com>
> > > >
> > > wrote:
> > > ... 
> > > 
> > > > 
> > > > Other system with same nfs mounted storage is fine. Storage server
> > > > is connected to both number crunchers by dedicated, unswitched
> > > > 10Gbps fiber ethernet. 
> > > > 
> > > > > 
> > > > > 
> > > > > 

> > > > 
> > > > 
> > > > 

> > > 
> > > You mean with direct connections?  In that case, the other number
> > > cruncher's connection could be fine, while the affected system could
> > > not be able to do networking to the NFS server (for some as yet
> > > undetermined reason), which could result in the behavior you describe
> > > if the NFS mount is "hard".
> > > 

> > 
> > 
> > 

> 
> _______________________________________________
> Ale mailing list
> 
Ale at ale.org> 
http://mail.ale.org/mailman/listinfo/ale
> 
> See JOBS, ANNOUNCE and SCHOOLS lists at
> 
http://mail.ale.org/mailman/listinfo> 

-- 
James P. Kinney III

Every time you stop a school, you will have to build a jail. What you
gain at one end you lose at the other. It's like feeding a dog on his
own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain

http://heretothereideas.blogspot.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20151020/aad17105/attachment.html>


More information about the Ale mailing list