[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: clustering

To: Leonid Leibman <http://www.gmail.com/~lleibman>
Subject: Re: clustering
From: http://dummy.us.eu.org/robert (Robert)
Date: Thu, 31 Mar 2005 17:22:48 -0800
Keywords: http://www.gmail.com/~lleibman

 > From: Leonid Leibman <http://www.gmail.com/~lleibman>
 > Date: Thu, 31 Mar 2005 20:07:05 -0500
 >
 > Conceptually clustering and disambiguation is the same. Basically one
 > can think of a concept as represented by a group of items (pages for
 > example) pertaining to this concept.  Such groups can of course
 > overlap (same thig can be about cars and about money). Finding good
 > representative clusters and properly "projecting" to the relevant ones
 > is disambiguation (or one mechanism of it). This is what I mean.

Oh, OK.  Links_2_Links's clustering did not disambiguate at all -- there was
lots of ambiguity in concepts.

 > glimpse was fast searching but slow indexing -- that took a while (for .5 gig).

Actually, search is pretty slow, too.

 > Wu Manber doesn't do DFA at all. It's actually a much simpler concept
 > but it doesn't work well with wildcards. I did look up the manual and
 > it indicates that it works with regular expressions to some extent
 > (the extent I'm sure being things like |-ing or small char classes).
 > Well, maybe I know a different algorithm...??? I saw the original
 > Wu-Manber article. Are we talking about the same thing?

Maybe something different.  The version here says that "a regular
expression must match words that appear in the index for glimpse to find
it".  But, it is full-fledged regular expressions.  There is also
wildcards ('#') which is like Google's '*' (star) operator.  There is some
option to force glimpse to do a full search and not use the index (which
is slower) so that you can do agrep-type searches on your corpus.  In that
case, regular expressions are most useful.

 > Thanks for WebGlimpse.

Sure.

 > Leonid

References:
- Re: clustering
  - From: Robert
- Re: clustering
  - From: Robert

Prev by Date: Re: clustering
Next by Date: thanks for the birthday gift
Previous by thread: Re: clustering
Next by thread: Re: clustering
Index(es):
- Date
- Thread