[ale] checking for process in uninterruptable sleep state

Todor Fassl fassl.tod at gmail.com
Tue May 10 15:14:39 EDT 2016


That worked fine. Thanks. I actually wrote the script on a workstation 
with a wedged usb sub-system at the time. So I could be sure the script 
worked as intended. I couldn't get it to reboot though. I don't know if 
you can't put a reboot command in a bash script or if it didn't work 
because of the original problem. Probably safer this way anyhow. I put 
it in cron on the 15 workstations because it generates a line of output 
if a machine is hung and cron itself will email me. For completeness, 
here is the full script:

#!/bin/bash
test ! -z "$1" && TIMEOUT=$1
test -z "$TIMEOUT" && TIMEOUT=7
( /usr/bin/lsusb > /dev/null  ) & PID=$!
sleep $TIMEOUT
if ps -e $PID > /dev/null 2>&1; then
	test ! -z "$VERBOSE" && echo "$HOSTNAME is okay."
else
	echo "$HOSTNAME is hung, process $PID" >&2
fi


#!/bin/bash

test ! -z "$1" && TIMEOUT=$1
test -z "$TIMEOUT" && TIMEOUT=7
( /usr/bin/lsusb > /dev/null  ) & PID=$!
sleep $TIMEOUT
if ps -e $PID > /dev/null 2>&1; then
	test ! -z "$VERBOSE" && echo "$HOSTNAME is okay."
else
	echo "$HOSTNAME is hung, process $PID" >&2
fi

On 05/10/2016 10:21 AM, Scott Plante wrote:
>
> How about if you do lsusb in the background then check the PID to see if it's still running/stuck, like:
>
>
> lsusb >/dev/null 2>&1 &
> usbpid=$!
> sleep 4 #or however long > max lsusb exec time
> if ps -p $usbpid >/dev/null 2>&1
> then
> #lsusb is hung--do your stuff here, reboot etc.
> fi
> ----- Original Message -----
>
> From: "Todor Fassl" <fassl.tod at gmail.com>
> To: "Atlanta Linux Enthusiasts" <ale at ale.org>
> Sent: Tuesday, May 10, 2016 10:39:26 AM
> Subject: [ale] checking for process in uninterruptable sleep state
>
> Okay, so my latest problem with these lab workstations is that accessing
> the usb sub-system puts the calling process into an uninterruptable
> sleep. I'd like to write a script to check for that so at least I'd know
> that I have to go over and reboot the machine.
>
> Details: I have 15 Dell workstations running ubuntu 15.10 (2 are running
> 16.04 -- that did not help). Occasionally, the keyboard and mouse
> freeze. Logging in remotely and running lsusb hangs such that you can't
> even control-c outand it cannot be killed even with a -9. The process
> goes into an uninterruptable sleep during a system call to open the file
> /sys/bus/usb/devices/usb1/descriptors. That file is part of the kernel's
> control files for the usb controller itself. So you can see why the
> keyboard and mouse are dead, the driver for the usb controller itself is
> hung.
>
> We've upgraded the kernel and installed Dell's latest bios upgrades. No
> joy. I am thinking the only remaining thing to do is to file a bug
> report. However, I could eleaviate the problem a little if I could
> easily detect it and reboot.
>
> The problem is that I can't figure out how to write a script to detect a
> process in a uninterruptable sleep state. No matter what I do,it seems
> to hang. I've tried something like "bash -c "lsusb' and 'timeout 5
> lsusb'. They both hang. The only thing I've been able to do is to have
> 2 different scripts. One running lsusb and another checking for blocked
> lsusb procs. But that is way ugly.
>
> PS: I wouldn't mind ideas wrt the original problem either. Not that I
> hold out any hope for that.
>
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>

-- 
Todd


More information about the Ale mailing list