[ale] rehashed - Linux laptops
Jim Lynch
ale_nospam at fayettedigital.com
Mon Jan 12 20:01:18 EST 2009
Ed Cashin wrote:
> On Mon, Jan 12, 2009 at 3:32 PM, Jim Lynch
> <ale_nospam at fayettedigital.com> wrote:
>
>> Ed Cashin wrote:
>>
>>> It looks like gmane has the archives at least since 2003.
>>>
> ...
>
>>> And there's search available---not super great search, but
>>> search nonetheless.
>>>
> ...
>
>> I'm just curious, what do you find lacking in the search? The reason I
>> asked is that it is using the Xapian search engine, which I find quite
>> good and am interesting in why people don't like it.
>>
>
> To be honest, this is a very old opinion, and I would have trouble
> justifying it. Now that you mention it, I see there are "AND" and "OR"
> options presented if I click "Searching" on the left instead of just
> going to the group page.
>
> I suppose perl or egrep-style regular expression searching would
> be "super great". Maybe that's impractical.
>
OK, thanks, yes, this type of search engine is not a full text search.
A full text search engine that allows regex and unlimited wildcards,
like an editor, are impractical after documents get over a certain
size/number. Once we have large storage devices made from quantum
transistors, it will be easier, but now you don't have the time to wait
for a full text search on large quantities of documents. So shortcuts
have to be taken and that's what the "probabilistic" search engines try
to do. Xapian does allow a post expansion RE, (refer*) but can get
rather slow if it finds lots of terms that fit the pattern.
They probably don't go into it in the docs on Gmane but phrase
searches("Search for me"), boolean searches (AND OR NOT), proximity
searches (walnut NEAR fruit, dog NEAR/3 cat) and a few more are
available. For a free (GPL) product it is remarkably fast. They are
still actively developing it and making it better all the time. IR
(information retrieval) is a fairly complex topic. One that I've been
working with for a number of years and still don't have my head
completely around. That may be 'cause my head might not be big enough. ;)
Jim.
More information about the Ale
mailing list