> From: leonid leibman <http://profiles.yahoo.com/lleibman> > Date: Thu, 31 Mar 2005 07:43:52 -0800 (PST) > > Hi, Robert -- I was reading recently about clustering > search engines (like clusty). Do you know what > technology is behind it? No, not specifically. I think it clusters by words and may even use an encyclopedia. (Clusty used to be vivisimo and, at the time, I was very impressed. It's still pretty neat, but don't know exactly how it works.) > In general what's your opinion about the state of the > art in clustering (if any :)? Google says that it's > not using it (yet) since it is of limited use. In the clusty sense, I think it is of limited use. But, I think combining recommendation-type systems (where users have certain interests) and search engines could be really great. http://www.directhit.com was working in this direction before askjeeves acquired it and quashed that part completely. > links_2_links clustering (disambiguation) was kind of weak > and I was trying to work on some ideas to improve it > but I'm realizing that the state of the art may have > improved quite a bit since then. Links_2_Links had almost no disambiguation. It's an extremely hard problem and there may be research written about it, but I know of no specific technology which addresses disambiguation. > Also, do you know of any simple data > extraction/classification freeware/shareware? I don't know what you mean by that. My spam filter uses ifile which uses Naive Bayes for its classification. It is word-based (although I augment that by combining word pairs through a separate program). There are a number of open source machine learning libraries. I think I remember that I was most impressed with Torch. > On a different topic, as far as interfaces to search > go I'm imagining that one can have drag and drop > boxes. Say 3 ("Good match", "Irrelevant match" and > "Undesirable match"). As a first step the user enters > keywords. Then he can place the results in the boxes > and the priorities of the results (and thus the > results you'll see) will change accordingly. This is > possible if there is a good clustering software behind > the scenes. Does this sound like a good idea to you? Sure. 'Though Amazon, Netflix, and Movielens use a simple star-based rating system which may be faster for most users than dragging items. I don't know. > Leonid > > Leonid Leibman