[ale] Lab Workstation Mystery
    Pete Hardie 
    pete.hardie at gmail.com
       
    Mon Mar 28 14:14:52 EDT 2016
    
    
  
I once tracked a bug that was due to the building elevator motors stopping
and starting differently after-hours
On Mon, Mar 28, 2016 at 1:36 PM, Dustin Strickland <
dustin.h.strickland at gmail.com> wrote:
> The compressors in air conditioning units or refrigerators can also have
> an effect when they kick on.
>
> On Mon, Mar 28, 2016 at 1:30 PM, Jim Kinney <jkinney at jimkinney.us> wrote:
>
>> Microwave!!!
>>
>> The EM field from those can cause screens to be wacky, wiggly while they
>> run . I moved my desk from the opposite side of the wall from the home
>> microwave and still had to get 10' away to stop interference.
>>
>> Bit flips happen.
>>
>> On March 28, 2016 1:20:45 PM EDT, Todor Fassl <fassl.tod at gmail.com>
>> wrote:
>>
>>>
>>> We've run every kind of hardware diagnostic we can think of. Besides,
>>> it's just these 14 machines in the 2 shared spaces. Identical machines
>>> in private offices don't seem to have any problem.H
>>>
>>> But, you're right. Ssome kind of power problem is the best theory I've
>>> seen for a while. The 2 rooms are in different buildings and they never
>>> had a problem before. But maybe somebody is plugging something in. Come
>>> to think of it, we had a similar problem years ago when a student put a
>>> microwave oven in his office. The computers on the other side of the
>>> wall kept going down. I don't know enough about electricity to explain
>>> that but the microwave oven and the computer were plugged into outlets
>>> on opposite sides of the same wall.
>>>
>>> What kind of gizmo would a grad student be bringing into a lab that
>>> would make linux workstations freeze up?
>>>
>>> Another reason this theory makes se
>>>  nse is
>>> that I haven't gotten a single
>>> complaint about the machines going down. You'd think if they were going
>>> down while people were using them, I'd get complaints. People are always
>>> logged in when they go down but that doesn't mean anything since they
>>> tend to walk away w/o logging out. I've looked for patterns in the list
>>> of users who were logged in whan a machine went down but didn't see any.
>>> I can't rule out that it's somebody doing something though.  There might
>>> be a pattern and I just didn't see it. But I am sure there isn't one guy
>>> who is always logged in whan a machine goes down.
>>>
>>> On 03/28/2016 11:05 AM, James Taylor wrote:
>>>
>>>>  The most common, if not the only, reason I've seen partitions get marked read-only is when I've had power glitches that that caused a very brief interruption in connectivity to
>>>>   the
>>>> drives.
>>>>  Normally that is not an issue with locally attached drives on workstations, but stranger things have happened.
>>>>  Are the workstations on UPS or is the power to the rooms conditioned properly.
>>>>  -jt
>>>>
>>>>
>>>>  James Taylor
>>>>  678-697-9420
>>>>  james.taylor at eastcobbgroup.com
>>>>
>>>>
>>>>
>>>>  Todor Fassl <fassl.tod at gmail.com> 3/28/2016 11:54 AM >>>
>>>>>>>
>>>>>>  I have a mysterious problem with workstations in a shared use
>>>>  environment. There are 2 labs in different buildings, onewith 6
>>>>  workstations and one with 8. These workstations ar
>>>>  e used
>>>> by a group of
>>>>  about 30 grad student TAs. All are running ubuntu 15.10. Authentication
>>>>  is via ldap and home directories are mounted  via nfs.  Every day, 2 or
>>>>  3 of the machines go down. The earliest symptom I can find is that the
>>>>  root filesystem is remounted read-only.  Soon they stop responding to
>>>>  ssh and snmp and they are essentially locked up. They still respond to
>>>>  pings though.
>>>>
>>>>  I've caught the machines in the period where the root system is
>>>>  read-only but I can still ssh to them. I've found that I cannot nfs
>>>>  mount home directories on our file server.  I can mount nfs shares on
>>>>  other servers. And I can mount the same home directories if I go to
>>>>  another workstation. Restarting nfs on the file server has no effect.
>>>>
>>>>  When I try to mount a home directory on an effected machine, the mount
>>>>  just hangs.  I ran it with strace and it just showed it was waiting --
>>>>  for what, I'm not sure and I
>>>>   don't
>>>> have a screen cap available at the
>>>>  moment. I put a packet sniffer on the server and it showed it received a
>>>>  single packet from the client and that's it.
>>>>
>>>>  There is nothing in the logs on the client. In fact, they simply stop at
>>>>  some point in the process. At first I attributed this to the root
>>>>  filesystem being read-only but it continues after I move /var to a
>>>>  separate file system. At some point it just stops writing records to the
>>>>  syslog but I don't know if it's before or after the root filesystem is
>>>>  remounted read-only.
>>>>
>>>>  Many of the TAs also have identical workstations in their offices. None
>>>>  of those machines seem to have this problem.  The TAs do tend to walk
>>>>  away from the workstations w/o logging out. But I wrote a script to kill
>>>>  off their sessions and it didn't help. I had it send me an email
>>>>  whenever it killed somebody's session and it doesn't seem to be
>>>>  correlated with that. In o
>>>>  ther
>>>> words, sometimes machines go down even if
>>>>  everyone who has used it has remembered to log out.
>>>>
>>>>  I'm pretty desperate. Any ideas?
>>>>
>>>> ------------------------------
>>>>
>>>>  Ale mailing list
>>>>  Ale at ale.org
>>>>  http://mail.ale.org/mailman/listinfo/ale
>>>>  See JOBS, ANNOUNCE and SCHOOLS lists at
>>>>  http://mail.ale.org/mailman/listinfo
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>>  Ale mailing list
>>>>  Ale at ale.org
>>>>  http://mail.ale.org/mailman/listinfo/ale
>>>>  See JOBS, ANNOUNCE and SCHOOLS lists at
>>>>  http://mail.ale.org/mailman/listinfo
>>>
>>>
>>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> http://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
>>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
>
-- 
Pete Hardie
--------
Better Living Through Bitmaps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20160328/29b66014/attachment.html>
    
    
More information about the Ale
mailing list