> From: Leonid Leibman <http://www.gmail.com/~lleibman> > Date: Thu, 31 Mar 2005 20:07:05 -0500 > > Conceptually clustering and disambiguation is the same. Basically one > can think of a concept as represented by a group of items (pages for > example) pertaining to this concept. Such groups can of course > overlap (same thig can be about cars and about money). Finding good > representative clusters and properly "projecting" to the relevant ones > is disambiguation (or one mechanism of it). This is what I mean. Oh, OK. Links_2_Links's clustering did not disambiguate at all -- there was lots of ambiguity in concepts. > glimpse was fast searching but slow indexing -- that took a while (for .5 gig). Actually, search is pretty slow, too. > Wu Manber doesn't do DFA at all. It's actually a much simpler concept > but it doesn't work well with wildcards. I did look up the manual and > it indicates that it works with regular expressions to some extent > (the extent I'm sure being things like |-ing or small char classes). > Well, maybe I know a different algorithm...??? I saw the original > Wu-Manber article. Are we talking about the same thing? Maybe something different. The version here says that "a regular expression must match words that appear in the index for glimpse to find it". But, it is full-fledged regular expressions. There is also wildcards ('#') which is like Google's '*' (star) operator. There is some option to force glimpse to do a full search and not use the index (which is slower) so that you can do agrep-type searches on your corpus. In that case, regular expressions are most useful. > Thanks for WebGlimpse. Sure. > Leonid