[ale] Lab Workstation Mystery

Todor Fassl fassl.tod at gmail.com
Mon Mar 28 17:42:11 EDT 2016


Well, all the workstations in the same room would be on the same switch 
but not the onese in different rooms.



On 03/28/2016 03:55 PM, Jim Kinney wrote:
> Are the on the same switch?
>
> On March 28, 2016 3:00:58 PM EDT, Todor Fassl <fassl.tod at gmail.com> wrote:
>> This particular problem, if it is a power problem, has to be caused by
>> something a person could lug into the labs.  I've been saying they are
>> in different buildings but, technically, they are on different wings of
>>
>> the same building. I tend to think of them as different buildings
>> because I usually go outdoors to get from one to another. The point is
>> that they are widely separated.
>>
>> I haven't tried to find a pattern in the time of day. I only paid
>> enough
>> attention to the time of the crashes to be certain that there is no
>> obvious pattern. The crashes occur at different times of day and night.
>>
>>
>>
>> On 03/28/2016 01:14 PM, Pete Hardie wrote:
>>> I once tracked a bug that was due to the building elevator motors
>> stopping
>>> and starting differently after-hours
>>>
>>>
>>> On Mon, Mar 28, 2016 at 1:36 PM, Dustin Strickland <
>>> dustin.h.strickland at gmail.com> wrote:
>>>
>>>> The compressors in air conditioning units or refrigerators can also
>> have
>>>> an effect when they kick on.
>>>>
>>>> On Mon, Mar 28, 2016 at 1:30 PM, Jim Kinney <jkinney at jimkinney.us>
>> wrote:
>>>>
>>>>> Microwave!!!
>>>>>
>>>>> The EM field from those can cause screens to be wacky, wiggly while
>> they
>>>>> run . I moved my desk from the opposite side of the wall from the
>> home
>>>>> microwave and still had to get 10' away to stop interference.
>>>>>
>>>>> Bit flips happen.
>>>>>
>>>>> On March 28, 2016 1:20:45 PM EDT, Todor Fassl <fassl.tod at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> We've run every kind of hardware diagnostic we can think of.
>> Besides,
>>>>>> it's just these 14 machines in the 2 shared spaces. Identical
>> machines
>>>>>> in private offices don't seem to have any problem.H
>>>>>>
>>>>>> But, you're right. Ssome kind of power problem is the best theory
>> I've
>>>>>> seen for a while. The 2 rooms are in different buildings and they
>> never
>>>>>> had a problem before. But maybe somebody is plugging something in.
>> Come
>>>>>> to think of it, we had a similar problem years ago when a student
>> put a
>>>>>> microwave oven in his office. The computers on the other side of
>> the
>>>>>> wall kept going down. I don't know enough about electricity to
>> explain
>>>>>> that but the microwave oven and the computer were plugged into
>> outlets
>>>>>> on opposite sides of the same wall.
>>>>>>
>>>>>> What kind of gizmo would a grad student be bringing into a lab
>> that
>>>>>> would make linux workstations freeze up?
>>>>>>
>>>>>> Another reason this theory makes se
>>>>>>    nse is
>>>>>> that I haven't gotten a single
>>>>>> complaint about the machines going down. You'd think if they were
>> going
>>>>>> down while people were using them, I'd get complaints. People are
>> always
>>>>>> logged in when they go down but that doesn't mean anything since
>> they
>>>>>> tend to walk away w/o logging out. I've looked for patterns in the
>> list
>>>>>> of users who were logged in whan a machine went down but didn't
>> see any.
>>>>>> I can't rule out that it's somebody doing something though.  There
>> might
>>>>>> be a pattern and I just didn't see it. But I am sure there isn't
>> one guy
>>>>>> who is always logged in whan a machine goes down.
>>>>>>
>>>>>> On 03/28/2016 11:05 AM, James Taylor wrote:
>>>>>>
>>>>>>>    The most common, if not the only, reason I've seen partitions
>> get marked read-only is when I've had power glitches that that caused a
>> very brief interruption in connectivity to
>>>>>>>     the
>>>>>>> drives.
>>>>>>>    Normally that is not an issue with locally attached drives on
>> workstations, but stranger things have happened.
>>>>>>>    Are the workstations on UPS or is the power to the rooms
>> conditioned properly.
>>>>>>>    -jt
>>>>>>>
>>>>>>>
>>>>>>>    James Taylor
>>>>>>>    678-697-9420
>>>>>>>    james.taylor at eastcobbgroup.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    Todor Fassl <fassl.tod at gmail.com> 3/28/2016 11:54 AM >>>
>>>>>>>>>>
>>>>>>>>>    I have a mysterious problem with workstations in a shared use
>>>>>>>    environment. There are 2 labs in different buildings, onewith 6
>>>>>>>    workstations and one with 8. These workstations ar
>>>>>>>    e used
>>>>>>> by a group of
>>>>>>>    about 30 grad student TAs. All are running ubuntu 15.10.
>> Authentication
>>>>>>>    is via ldap and home directories are mounted  via nfs.  Every
>> day, 2 or
>>>>>>>    3 of the machines go down. The earliest symptom I can find is
>> that the
>>>>>>>    root filesystem is remounted read-only.  Soon they stop
>> responding to
>>>>>>>    ssh and snmp and they are essentially locked up. They still
>> respond to
>>>>>>>    pings though.
>>>>>>>
>>>>>>>    I've caught the machines in the period where the root system is
>>>>>>>    read-only but I can still ssh to them. I've found that I cannot
>> nfs
>>>>>>>    mount home directories on our file server.  I can mount nfs
>> shares on
>>>>>>>    other servers. And I can mount the same home directories if I
>> go to
>>>>>>>    another workstation. Restarting nfs on the file server has no
>> effect.
>>>>>>>
>>>>>>>    When I try to mount a home directory on an effected machine,
>> the mount
>>>>>>>    just hangs.  I ran it with strace and it just showed it was
>> waiting --
>>>>>>>    for what, I'm not sure and I
>>>>>>>     don't
>>>>>>> have a screen cap available at the
>>>>>>>    moment. I put a packet sniffer on the server and it showed it
>> received a
>>>>>>>    single packet from the client and that's it.
>>>>>>>
>>>>>>>    There is nothing in the logs on the client. In fact, they
>> simply stop at
>>>>>>>    some point in the process. At first I attributed this to the
>> root
>>>>>>>    filesystem being read-only but it continues after I move /var
>> to a
>>>>>>>    separate file system. At some point it just stops writing
>> records to the
>>>>>>>    syslog but I don't know if it's before or after the root
>> filesystem is
>>>>>>>    remounted read-only.
>>>>>>>
>>>>>>>    Many of the TAs also have identical workstations in their
>> offices. None
>>>>>>>    of those machines seem to have this problem.  The TAs do tend
>> to walk
>>>>>>>    away from the workstations w/o logging out. But I wrote a
>> script to kill
>>>>>>>    off their sessions and it didn't help. I had it send me an
>> email
>>>>>>>    whenever it killed somebody's session and it doesn't seem to be
>>>>>>>    correlated with that. In o
>>>>>>>    ther
>>>>>>> words, sometimes machines go down even if
>>>>>>>    everyone who has used it has remembered to log out.
>>>>>>>
>>>>>>>    I'm pretty desperate. Any ideas?
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>>    Ale mailing list
>>>>>>>    Ale at ale.org
>>>>>>>    http://mail.ale.org/mailman/listinfo/ale
>>>>>>>    See JOBS, ANNOUNCE and SCHOOLS lists at
>>>>>>>    http://mail.ale.org/mailman/listinfo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>>    Ale mailing list
>>>>>>>    Ale at ale.org
>>>>>>>    http://mail.ale.org/mailman/listinfo/ale
>>>>>>>    See JOBS, ANNOUNCE and SCHOOLS lists at
>>>>>>>    http://mail.ale.org/mailman/listinfo
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Sent from my Android device with K-9 Mail. Please excuse my
>> brevity.
>>>>>
>>>>> _______________________________________________
>>>>> Ale mailing list
>>>>> Ale at ale.org
>>>>> http://mail.ale.org/mailman/listinfo/ale
>>>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>>>> http://mail.ale.org/mailman/listinfo
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Ale mailing list
>>>> Ale at ale.org
>>>> http://mail.ale.org/mailman/listinfo/ale
>>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>>> http://mail.ale.org/mailman/listinfo
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ale mailing list
>>> Ale at ale.org
>>> http://mail.ale.org/mailman/listinfo/ale
>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>> http://mail.ale.org/mailman/listinfo
>>>
>>
>> --
>> Todd
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> http://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>

-- 
Todd


More information about the Ale mailing list