[ale] lsof and a hung system

Lightner, Jeff JLightner at dsservices.com
Tue Oct 20 13:19:51 EDT 2015


I assume you're joking but just in case:

Systemd has services that can start/stop without dependence of the entire stack of services unlike init.   However, some of the services may be dependent on SOME other services running.   The beauty of this is with a hung system you might actually shut down most services even if some things like NFS are hung so that when you power cycle you're not pulling the legs out from under as many things as you might if your init based shutdown hung on the first script it tried to stop.

P.S. vim rules!

-----Original Message-----
From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of DJ-Pfulio
Sent: Tuesday, October 20, 2015 1:09 PM
To: ale at ale.org
Subject: Re: [ale] lsof and a hung system

But isn't systemd supposed to solve these issues?

BTW, I had to add a similar delay in the startup of a raspberry-pi box that got systemd with the 4.1 kernel in a debian install.

On 10/20/2015 12:25 PM, Jim Kinney wrote:
> Yep. The 10G card driver had oopsed all over itself and wouldn't keep 
> a connection up. I initially tried to stop network, unload the module, 
> load the module, start the network but even that failed to reset the 
> card completely. I needed to add a sleep 20 before loading the module 
> again. Once the connection was actually working the system was cleanly 
> rebooted to lop off the zombies and things were happily OK.
> On Tue, 2015-10-20 at 11:32 -0400, Ed Cashin wrote:
>> On Mon, Oct 19, 2015 at 10:58 PM, Jim Kinney <jim.kinney at gmail.com>
>> wrote:
>> ... 
>>> Other system with same nfs mounted storage is fine. Storage server 
>>> is connected to both number crunchers by dedicated, unswitched 
>>> 10Gbps fiber ethernet.
>>>>
>>>
>> You mean with direct connections?  In that case, the other number 
>> cruncher's connection could be fine, while the affected system could 
>> not be able to do networking to the NFS server (for some as yet 
>> undetermined reason), which could result in the behavior you describe 
>> if the NFS mount is "hard".
>
_______________________________________________
Ale mailing list
Ale at ale.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo



More information about the Ale mailing list