[ale] Log parsing/alerting tool recomendations

JD jdp at algoloma.com
Thu May 30 08:39:14 EDT 2013


I've worked on both sides of this issue and like to think the places where I
worked did logging better than anyone else.

The exact format of the logs isn't as important as having the necessary data to
exactly determine where any problem lies.
1. Log files should be all that any support person needs to determine an issue
with the program. Developers **should NEVER** need to login on any production
server.
2. Logs should have multiple levels which can dynamically be turned up or down
WHILE the program is running. No restart needed.
3. Timestamp is mandatory.
4. ERROR|WARNING|INFO to help with grepping later by knowledgeable people.
5. Program name (including unique "build" identifier, module, function, method,
and line number are highly helpful
6. A program/script provided by the vendor to gather any necessary
troubleshooting information is MANDATORY. Telling a low skilled person to gather
log-A and config-A thru Z files and syslog-B and uname-a and .... is not good.
Have the vendor create AND provide a script that captures whatever information
they will need to troubleshoot.
7. Startup and shutdown would be nice "INFO" entries for the program.
8. Use syslog when possible so the local admin can have control over the final
target of the logs. It also means that using a tool like cacti any/or splunk is
easier.

For fat-clients, access to a stack trace that an end-user can copy/paste into an
email is extremely helpful too. It is amazing what the __MODULE__ and __LINE__
macros can do for pinpointing issues.  Being able to tell a client exactly what
issue they've hit in about 3 minutes on the phone saves you AND the customer
time. Until a fix is available, the client can avoid that area of the code and
you know exactly which check is failing (or have a 95%+ likely location).


On 05/29/2013 02:49 PM, Jim Kinney wrote:
> check this site for an idea:
>
> http://www.w3.org/Daemon/User/Config/Logging.html
>
> I find apache logs very easy to parse with many tools.
>
> Another way to look at this is from the viewpoint of someone having to
> understand why the apps are not working as they think they should. What type
> of data would help?  At that point, the data type will usually dictate an
> output format. And sysadmins _LOVE_ error messages like:
>
> application foo received 0x120BAF02 at 0x33BD000001A1. Is this OK?
>
> That is useless! Saw that (different addresses) during a Debian install once.
> I thought my head would explode.
>
> I've seen many java applications (java is a great drink and a country I've
> never visited. It's a crappy language that should not be taught - grr) that
> split up logs into sort of a user/admin general process, admin error, and
> application error tracking. Each had a deeper level of details with very long
> time stamps.
>
>
> On Tue, May 28, 2013 at 6:34 PM, Robert L. Harris <robert.l.harris at gmail.com
> <mailto:robert.l.harris at gmail.com>> wrote:
>
>
>       I'm working with a number of developers trying to create a logging
>     standard for some apps and devices my company is developing.  Most of them
>     are linux based and running syslog-ng so we have some flexibility and can
>     standardize.  The big concern though is coming up with a format for the
>     logs for the tools we will (may) be using to parse the data.  Personally I
>     like the idea of using cmd line and piping unix utils.
>
>       A recommendation was thrown out though to ask about how others are
>     parsing system and application logs to look for issues, tracking, etc and
>     what kinds of input they take (json, xlm, .log, etc).  Anyone have any
>     tools you're using that are just incredible and what kinds of input they
>     can work with?
>
>     Robert
>
>
>     -- 
>     :wq!
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20130530/20fa3611/attachment-0001.html>


More information about the Ale mailing list