[ale] Collectl and other peformance tools

Thu Oct 7 13:16:19 EDT 2010

"Lightner, Jeff" <jlightner at water.com> wrote:

> We use Nagios for monitoring here.   I'm not so much looking for a
> monitoring solution as something that can let you see system details on
> the fly much the way top, vmstat and the others do.  The difference
> being this appears to be something one can use to see disk, cpu, memory
> and network all for the same interval.

You might also have a look at OpenNMS (http://www.opennms.org/), which offers both performance data collection / graphing / thresholding and Nagios-style service assurance (plus tons more) in a single package.  It's 100% free software, designed to use standards-track protocols (e.g. SNMP), and scales gracefully from very small to very large numbers of managed nodes.  As the name implies, it's not just for systems, but it's a very good platform for systems management.  I gave a talk on the subject at this year's Texas Linux Fest.  The video seems not to have materialized, but here's the Ogg audio, which doesn't really start until the 1'30" mark:

http://www.archive.org/details/Tlf-Large-scaleLinuxSystemsMonitoringWithOpennms

And a PDF of my slides:

http://www.opennms.org/~jeffg/slides/TLF2010_Gehlbach-short.pdf

Finally, if you want a peek at a real-world report that includes performance data on disk, CPU, memory, and network for a couple of production CentOS 5.5 servers, log in as user "demo" with password "demo" here:

http://demo.opennms.org/opennms/KSC/customView.htm?type=custom&report=213

(Note that I'm biased since my livelihood comes from selling services for OpenNMS.)

-jeff