[ale] diagnosis
    David Corbin 
    dcorbin at machturtle.com
       
    Sun Apr 25 18:33:10 EDT 2004
    
    
  
On Saturday 24 April 2004 11:13, James P. Kinney III wrote:
> > The "investigation"  I ran yesterday *was* in single user mode.  And to
> > keep things fresh in your memory, as soon as the /var/run/utmp file
> > exists (even in single user mode),  memory starts disappearing from free
> > to be used by buffers.  If that file is not there (when I mount /var) I
> > do not see evidence of the memory leak.  I've never let it exhaust memory
> > while in single user mode, but at run-level 2 (normal) it eventually runs
> > out of memory to allocate.  I wouldn't really says the system crashes,
> > but none of the applicatoins can operate as no RAM is available for them.
>
> Well, utmp is a storage area for logins and usage info. It that file is
> growing in single user mode with nothing else running, you have a
> problem. The kernel should be what is generating the data for the utmp
> file. Since the presence of utmp initiates the memory loss, I would
> suspect that kernel is corrupted and is not flushing the write to utmp
> and is instead buffering the write process and/or data. This may
> indicate a bad hard drive, trojaned kernel or failing RAM.
>
I'm reasonably sure it's not a trojaned kernel - building a new kernel from 
another machine was one of the first tests (though I didn't put it on a CD, 
but installed it on the hard drive, I admit...)
> Run memtest and rule out that. Then copy a kernel from a CD distribution
> and set lilo/grub to use that kernel. Then boot to single user, touch
> utmp, reboot back to single user with the same CD kernel and watch the
> top process. If there is still the problem, drop in an other hard drive,
> make it the /var partition, and try again.
>
When you say "memtest", you're referring to the shell-script that does lots of 
tarring/untarring?
> If all that fails, get a Geiger counter and start looking for a
> radiation source that can cause bit flips :)
>
> > > On Fri, 2004-04-23 at 17:37, David Corbin wrote:
> > > > I tried it with the "safe" version of top.  It shows nothing that
> > > > isn't in my regular top.  However, I did try "vmstat" which was
> > > > there.  It shows that the free memory is disappear as the "buffers"
> > > > is growing.
> > > >
> > > > Does that help any?
> > > >
> > > > On Monday 19 April 2004 20:35, James P. Kinney III wrote:
> > > > > I put up a page with the binaries and source on it :
> > > > >
> > > > > http://www.localnetsolutions.com/tools/
> > > > >
> > > > > Note: the procps page on sourceforge did not have an md5 checksum.
> > > > >
> > > > > On Mon, 2004-04-19 at 20:02, David Corbin wrote:
> > > > > > On Monday 19 April 2004 15:01, James P. Kinney III wrote:
> > > > > > > If it is a cracked machine, running a statically linked top
> > > > > > > from a CD will gain access to the real top data. Top is a
> > > > > > > common binary to fiddle with with a root kit.
> > > > > >
> > > > > > Sounds reasonable.  Can you point me at such, or if not that,
> > > > > > anybody got any idea where the source to top is and I'll build my
> > > > > > own.
> > > > > >
> > > > > > > It is certainly possible to _add_ a module or _remove_ a
> > > > > > > module, but change out the kernel with out a reboot (unless
> > > > > > > 2-kernel-monte is available, I have not been able to find this
> > > > > > > :(  ). So the actual data stream for top is not tamper-able
> > > > > > > easily. Thus a known good statically-linked top would give
> > > > > > > access to the running system and show the _real_ processes that
> > > > > > > are running.
> > > > > > >
> > > > > > > If top shows no malicious files, it's time to take some
> > > > > > > snapshots over time to plot which app is failing.
> > > > > > >
> > > > > > > #!/bin/sh
> > > > > > > echo date >> /tmp/top.txt
> > > > > > > top -b -n 1 -c >> /tmp/top.txt
> > > > > > > echo "###############" >>/tmp/top.txt
> > > > > > > echo >>/tmp/top.txt
> > > > > > > echo >>/tmp/top.txt
> > > > > > >
> > > > > > > Run as a cron every minute for an hour.
> > > > > > >
> > > > > > > If you want, you can now mash/mangle the data into a nice plot
> > > > > > > using some perl and gnplot (or a spreadsheet).
> > > > > > >
> > > > > > > On Mon, 2004-04-19 at 11:56, Geoffrey wrote:
> > > > > > > > Dow Hurst wrote:
> > > > > > > > > How can we find the process that is soaking the memory? 
> > > > > > > > > How do you manipulate /proc to find out the originating
> > > > > > > > > process that owns the memory being used?  I know IRIX had
> > > > > > > > > tools to look at memory and see which processes owned what
> > > > > > > > > part of memory.  Does Linux?
> > > > > > > > >
> > > > > > > > > Seems if you knew what was leaking you would have a major
> > > > > > > > > part of the battle won.
> > > > > > > >
> > > > > > > > I believe we mentioned top, but he noted that doesn't give
> > > > > > > > him anything. That's what concerns me.  If it doesn't show,
> > > > > > > > is it being hidden for a reason???
> > > >
> > > > _______________________________________________
> > > > Ale mailing list
> > > > Ale at ale.org
> > > > http://www.ale.org/mailman/listinfo/ale
> >
> > _______________________________________________
> > Ale mailing list
> > Ale at ale.org
> > http://www.ale.org/mailman/listinfo/ale
    
    
More information about the Ale
mailing list