[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tag/feature/attribute/dimension correlation



 > From: Alex  <http://www.gmail.com/~alex.>
 > Date: Sat, 26 Aug 2017 21:07:27 -0400
 >
 > Ah, I get it now. I looked up TLSH, that helped. Are you just thinking of
 > using it for finding if an email has an approximate match with a known
 > spam/ham (in a new way)?

No.  I do use TLSH and Nilsimsa for deduplicating the data during
training.  It has made a dramatic improvement in the quality of the
bayesian model.

I'm thinking I should try to get Jing to remove co-occurrent (is that a
word??) features since SVM has to have such a limited set of features due
to memory constraints during training.  (There are only a few thousand
features -- even your existing Python code would work for this purpose.)

 > What was the solution to the to do list problem? Something with Markov
 > chains, I think?

Yes.  That was your suggestion and I began doing some coding using that.
Of course, I don't have time to finish it.

 > On Aug 26, 2017 20:55, "Robert" <http://dummy.us.eu.org/robert> wrote:
 > > BTW, I only thought to look at your code after I was growing frustrated at
 > > having to rearrange my big todo list and wishing that I had a program
 > > which rearranged my todo list automatically and wishing that you would
 > > write it for me :-).




Why do you want this page removed?