[ale] lsof and a hung system

Ed Cashin ecashin at noserose.net
Tue Oct 20 12:40:16 EDT 2015


Glad you're back in business!

On Tue, Oct 20, 2015 at 12:25 PM, Jim Kinney <jkinney at jimkinney.us> wrote:

> Yep. The 10G card driver had oopsed all over itself and wouldn't keep a
> connection up. I initially tried to stop network, unload the module, load
> the module, start the network but even that failed to reset the card
> completely. I needed to add a sleep 20 before loading the module again.
> Once the connection was actually working the system was cleanly rebooted to
> lop off the zombies and things were happily OK.
>
> On Tue, 2015-10-20 at 11:32 -0400, Ed Cashin wrote:
>
> On Mon, Oct 19, 2015 at 10:58 PM, Jim Kinney <jim.kinney at gmail.com> wrote:
> ...
>
> Other system with same nfs mounted storage is fine. Storage server is
> connected to both number crunchers by dedicated, unswitched 10Gbps fiber
> ethernet.
> >
>
>
> You mean with direct connections?  In that case, the other number
> cruncher's connection could be fine, while the affected system could not be
> able to do networking to the NFS server (for some as yet undetermined
> reason), which could result in the behavior you describe if the NFS mount
> is "hard".
>
> --
>   Ed Cashin <ecashin at noserose.net>
>
> _______________________________________________
> Ale mailing listAle at ale.orghttp://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists athttp://mail.ale.org/mailman/listinfo
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
>


-- 
  Ed Cashin <ecashin at noserose.net>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20151020/80d28ae2/attachment.html>


More information about the Ale mailing list