[ale] Any MySQL Experts around?

Sat Feb 14 22:28:23 EST 2004

Just to add my $0.02 to this argument, I know it's possible to do 
something like this b/c it's done at my office.  Our two production 
iPlanet 4 webservices running on Sun 280Rs generate nearly 4 gigabytes 
of logs per day that we export the next day into an oracle db running 
on a quad proc v880 (soon to be a 12 proc v1280--but that's another 
story).  Not including the time it takes to move the files to the 
staging env to gunzip and untar them it takes about 3 hours to import 
the raw data.  Those numbers are pretty rough though and I don't know 
how much data is dropped.  I can give more precise figures on monday 
after I check some logs--if anybody's interested.  This process runs 
concurrently with reports and other data imports so your mileage may 
vary.  I'll see if I can get some eyeballs on the scripts that control 
the import, but it may be wholly useless in this case since you want to 
write directly to a db and not update after the fact.

hope that provides some comparison.

wood
On Feb 14, 2004, at 10:11 PM, Jason Etheridge wrote:

>> I'm thinking of having all my server logs (including NT/2K/2K+3) 
>> record
>> into a MySQL db for better reporting.  Does anyone know if MySQL can
>> handle 100,000+ inserts per day?  Also, if it handling this many 
>> inserts,
>> how good is the SELECT performance for the automated reporting?  ANy
>> thoughts, pointers?
>
> Disclaimer:  I'm not a real DBA and the tuple calculus I learned in 
> school actually tripped me up in the real world :D
>
> The largest table I work with in mysql has roughly 8 million records 
> using full-text indexing.  Each row contains a little over 260 bytes 
> of data.  Every morning I turn indexing off and do a LOAD DATA REPLACE 
> touching every record, changing a good bit of them, and adding a 
> thousand or so new records.  It can take around 3 hours to do that and 
> reindex on an old Sun E10K (I forget how many processors; maybe a 
> dozen for that domain).  Queries are fast after that, but I use READ 
> LOCK's and there are no updates during the day.  I'm not reporting on 
> live data.
>
> If you don't use full-text indexing, and instead use LIKE statements 
> on large fields, searching is going to be pretty slow with that many 
> records, especially if you can't lock the tables because you're doing 
> updates.  If you do use full-text indexing, I'd worry about 100,000+ 
> inserts a day and concurrent searching.  Full-text updates work best 
> with indexing disabled, and even then, when you go to renable 
> indexing, it'll do it via REPAIR BY SORTING, which could take a long 
> time if you don't have enough RAM to do it without thrashing the 
> disks.
>
> What sort of reporting do you want to do and sort of logs are they?  
> If you just want to store key statistical information rather than 
> entire log entries, that would be better.  But for storing actual 
> logs, I'm not sure relational databases are the best tool.  For 
> log-based reporting, I currently use a kludge: the system loads into 
> temporary tables specific data parsed from log entries from just the 
> months we're interested in for the current running report.  This only 
> works well for one or two months in our case because of the arbitrary 
> timeout value I set.  For anything really large, like a year's worth 
> of transactions, I just run the logs through a perl script and forget 
> mysql.
>
> I have another table without full-text indexing that gets searched and 
> updated throughout the day, with about a million records.  But nothing 
> is added except in the mornings, just updated.  I'll try to find some 
> stats for that if you want.
>
> Best thing to do is get a test machine and play around with some 
> sample loads, and make sure to allocate a lot of filehandles for 
> mysql's process space.
>
> -- Jason
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
>