[ale] Indexing and searching very, very large Maildir

Michael B. Trausch mike at trausch.us
Fri Jul 18 13:47:05 EDT 2008


On Fri, 2008-07-18 at 11:06 -0400, John Wells wrote:
> We have a very large directory in Maildir format that we need to
> index, search and then view messages within. Large as in 672 Gigs and
> growing. We used to use mairix, but it started choking at around the
> 200G mark.
> 
> Does anyone know of a tool, either open or commercial, that can do
> this effectively? We have a budget for it..

Wow, that's a lot of mails.

I'd agree with what Jim said---a database is going to be your best bet.
If your messages are currently in maildir, then you should be able to
write a program that will go through, parse the message headers, and
store them in a database.  Expect it to take a long time, of course.

After the import is done, have the database index the parts you'll need
(you won't want the indexes available when loading that massive amount
of data).  It'll take a little while to index everything, too, of
course.  But, you'll get very quick searching, given enough memory to
retain a cache of the indexes in system RAM.

	--- Mike

-- 
My sigfile ran away and is on hiatus.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://mail.ale.org/pipermail/ale/attachments/20080718/844613e7/attachment.bin 


More information about the Ale mailing list