> From: Leonid Leibman <http://www.gmail.com/~lleibman> > Date: Thu, 31 Mar 2005 17:13:56 -0500 > > I actually installed glimpse and tried it. It is amazingly fast. It > doesn't support real regular expressions and is probably based on Wu > Manber... > My God, it is DEVELOPED by Wu and Manber. Mmm. It does support regular expressions. And approximate matching. Look up the manual. Yes, Uri Manber, the designer of the uber-approximate matching algorithm that combines regular expressions and approximate matching in a single DFA. It's mind-bogglingly complicated (I tried to read the paper when I was working on that FAQ parsing project). > Of course just querying an index isn't good enough for an > "intelligent" software since it will still be too slow. There must be > an interface to the index itself. I'll look it up. There's WebGlimpse. > Another thing is that such an indexing mechanism is most likely very > lossy as far as "clustering" goes. Maybe not. glimpse is not lossy at all. That's why it's "slow". (Perhaps it's faster for you 'cause you're only doing a few megabytes. Trying to index several gigabytes gets problematic.) But, it doesn't cluster at all. ifile does classification based on words via naive bayes. > If you find that free source search engine info, please send it to me. OK. > Or anything that attempts to do disambiguation... The open source search engine does not do disambiguation. I don't know of any open source code that does this... Actually, I don't know of any proprietary, either, now that I think about it. > Leonid