Sunday, November 17, 2013


For the overlapping parts, regular correlation coefficient is .349. Spearman's Rho is .357.  But all that changes a lot if you correct for that one place in the middle where the top graph goes up and the bottom one goes down. I think this happens because for most of the overlapping section we are comparing apples with apples, but just at that point, an orange or two got dropped in.

The challenge is to decide when you have reached that magical point where you are manipulating your data rather than evaluating it.

And that's what I'll be spending tomorrow morning doing.

Friday, November 08, 2013

A Little Formula

Turns out that you can find out stuff about Old English texts with just a simple formula:

For any text of length n
with a sub-segment of length w < n

where k is the first term in w
þ is the total number of thorns in the segment;
ð is the total number of eths in the segment;
and w+k ≤ n.

The real tricks are figuring out if what you're detecting is significant or just a product of stochastic variation and, if it is statistically significant, whether or not it is just an epiphenomenon of a less interesting process.

As Richard Feynman, one of my intellectual heroes, once said “The first principle is that you must not fool yourself and you are the easiest person to fool.”

Which is why I've been having learning to debug programs in Python and re-learning Stats II from 20+ years ago.

Unfortunately, at least one of the more striking findings is looking like its just an epiphenomenon. But the good news is that the other discoveries seem like they are pretty robust.