> From: Alex <http://www.gmail.com/~alex.> > Date: Sat, 26 Aug 2017 21:07:27 -0400 > > Ah, I get it now. I looked up TLSH, that helped. Are you just thinking of > using it for finding if an email has an approximate match with a known > spam/ham (in a new way)? No. I do use TLSH and Nilsimsa for deduplicating the data during training. It has made a dramatic improvement in the quality of the bayesian model. I'm thinking I should try to get Jing to remove co-occurrent (is that a word??) features since SVM has to have such a limited set of features due to memory constraints during training. (There are only a few thousand features -- even your existing Python code would work for this purpose.) > What was the solution to the to do list problem? Something with Markov > chains, I think? Yes. That was your suggestion and I began doing some coding using that. Of course, I don't have time to finish it. > On Aug 26, 2017 20:55, "Robert" <http://dummy.us.eu.org/robert> wrote: > > BTW, I only thought to look at your code after I was growing frustrated at > > having to rearrange my big todo list and wishing that I had a program > > which rearranged my todo list automatically and wishing that you would > > write it for me :-).