[ale] lsof and a hung system

DJ-Pfulio DJPfulio at jdpfu.com
Tue Oct 20 13:09:12 EDT 2015


But isn't systemd supposed to solve these issues?

BTW, I had to add a similar delay in the startup of a raspberry-pi box
that got systemd with the 4.1 kernel in a debian install.

On 10/20/2015 12:25 PM, Jim Kinney wrote:
> Yep. The 10G card driver had oopsed all over itself and wouldn't keep a
> connection up. I initially tried to stop network, unload the module,
> load the module, start the network but even that failed to reset the
> card completely. I needed to add a sleep 20 before loading the module
> again. Once the connection was actually working the system was cleanly
> rebooted to lop off the zombies and things were happily OK.
> On Tue, 2015-10-20 at 11:32 -0400, Ed Cashin wrote:
>> On Mon, Oct 19, 2015 at 10:58 PM, Jim Kinney <jim.kinney at gmail.com>
>> wrote:
>> ... 
>>> Other system with same nfs mounted storage is fine. Storage server
>>> is connected to both number crunchers by dedicated, unswitched
>>> 10Gbps fiber ethernet. 
>>>>
>>>
>> You mean with direct connections?  In that case, the other number
>> cruncher's connection could be fine, while the affected system could
>> not be able to do networking to the NFS server (for some as yet
>> undetermined reason), which could result in the behavior you describe
>> if the NFS mount is "hard".
>


More information about the Ale mailing list