[ale] Dealing with really big log files....

robert mccurdy freeload101 at yahoo.com
Sun Mar 22 13:37:02 EDT 2009


I would chek out splunk its uber 1337

    -----==-=====--==-=====--==-=====--==-Tomorrow’s security today!
http://rmccurdy.com -----==-=====--==-=====--==-=====--==-  

--- On Sun, 3/22/09, scott mcbrien <smcbrien at gmail.com> wrote:
From: scott mcbrien <smcbrien at gmail.com>
Subject: Re: [ale] Dealing with really big log files....
To: ale at ale.org
Date: Sunday, March 22, 2009, 12:35 PM

You could write a perl script to break it apart for you.  The pseudo code would look something like:
open original log file
while input from file  read first line
  pattern match for the thing that looks like a date  open a different file (probably with date as part of the name)
  while read line contains date
    write out the line

    read the next line
  close the file 

close the original log file
variations would include adding some directory structure around where to place the logs when they're broken apart, or instead of separating by day, separating by month or year.

-Scott
On Sun, Mar 22, 2009 at 10:54 AM, Kenneth Ratliff <lists at noctum.net> wrote:

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1



On Mar 22, 2009, at 10:15 AM, Greg Freemyer wrote:



> If you have the disk space and few hours to let it run, I would just

> "split" that file into big chinks.  Maybe a million lines each.



Well, I could just sed the range of lines I want out in the same time

frame, and keep the result in one log file as well, which is my

preference. I've got about 400 gigs of space left on the disk, so I've

got some room. I mean, I don't really care about the data that goes

before, that should have been vaporized to the ether long before, I

just need to isolate the section of the log I do want so I can parse

it and give an answer to a customer.



> I'd recommend the source and destination of your split command be on

> different physical drives if you can manage it.  Even if that means

> connecting up a external usb drive to hold the split files.



Not a machine I have physical access to, sadly. I'd love to have a

local copy to play with and leave the original intact on the server,

but pulling 114 gigs across a transatlantic link is not really an

option at the moment.



> If you don't have the disk space, you could try something like:

>

> head -2000000 my_log_file | tail -50000 > /tmp/my_chunk_of_interest

>

> Or grep has a option to grab lines before and after a line that has

> the pattern in it.

>

> Hopefully one of those 3 will work for you.



mysql's log file is very annoying in that it doesn't lend itself to

easy grepping by line count. It doesn't time stamp every entry, it's

more of a heartbeat thing (like once a second or every couple seconds,

it injects the date and time in front of the process it's currently

running). There's no set number of lines between heartbeats, so one

heartbeat might have a 3 line select query, the next heartbeat might

be processing 20 different queries including a 20 line update.



I do have a script that will step through the log file and parse out

what updates were made to what database and what table at what time,

but it craps out when run against the entire log file, so I'm mostly

just trying to pare the log file down to a size where it'll work with

my other tools :)



> FYI: I work with large binary data sets all the time, and we use split

> to keep each chunk to 2 GB.  Not specifically needed anymore, but if

> you have read error etc. if is just the one 2 GB chunk you have to

> retrieve from backup.  if also affords you the ability to copy the

> data to FAT32 filesystem for portability.



Normally, we rotate logs nightly and keep about a weeks worth, so the

space or individual size comparisons are usually not an issue. In this

case, logrotate busted for mysql sometime back in November and the

beast just kept eating.

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v2.0.9 (Darwin)



iEYEARECAAYFAknGUTIACgkQXzanDlV0VY53YgCgkJxWJK6AAOZ+c2QTPN/gYLJH

v/YAoPZXNIBckyfhfbMGrAZ6TNEqcIxV

=IOjT

-----END PGP SIGNATURE-----



_______________________________________________

Ale mailing list

Ale at ale.org

http://mail.ale.org/mailman/listinfo/ale



_______________________________________________
Ale mailing list
Ale at ale.org
http://mail.ale.org/mailman/listinfo/ale



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20090322/9708873b/attachment.html 


More information about the Ale mailing list