[ale] How to debug a program that just goes away

David Ritchie deritchie at gmail.com
Sun Feb 28 17:40:56 EST 2010


>> >> I have a multi-threaded c++ program that occasionally just stops
>> >> running.  At the time it stops it is usually not doing anything.  Every
>> >> thread is either waiting on a semaphore or sleeping (Thread::sleep).
>> >> It's event driven and no events have arrived for some time.  I have lots
>> >> of prints to be able to tell where it is and what it's doing.  No core
>> >> file generated.  No strange messages in any log file, either system or
>> >> application.  No rogue processes killing it off.
>> >>

Have you thought about running the sar data collector on a 5 minute
interval, and run ' date >>log; ps -ef | grep process name >>log'
every minute or so? If you do that, you would have some idea
when the process is dying, its size (dependent on ps options you
pass), and overall system memory usage. This might give you an idea if
it is the OOM module causing the problem. Does this get
better is the machine has more memory? Also, are you catching all
signals in the application
sp that you can log them as they occur?

Just a few thoughts...

-- Dave



More information about the Ale mailing list