Below is the pseudo code. Although hashfunc() could be a "strong" (e.g., cryptographic) hash, it could just as easily be a fuzzy hash (i.e., locality sensitive hash, such as TLSH). This would allow inexact dimension correlations. In that case, 1) you could do a first sweep of the "search for matches" below and then compare the bitvectors and see if most of the points match (maybe using a raw count would be sufficient) 2) a second sweep would be needed where bits and pieces of the fuzzy hash are matched; that could get very hairy -- I would probably break pieces of the hash values into a tree and traverse the tree, determining whether most of each hash value matches; this is left as an exercise for the reader (i.e., you!) // initialize for d in D hashval[d] = 0 for n in rows for d in D // accumulate the new value into the hash value hashval[d] = hashfunc(hashval[d], val[n, d]) bitvector[d].append(val[n, d]) // bitvector will obviously get large; using sparse bit vector/array // compression would likely make sense // search for matches for d1 in D for d2 in D if hashval[d1] == hashval[d2] then if bitvector[d1] == bitvector[d2] then print "{} and {} correlate" % (d1, d2)