Thursday, January 5, 2012

Frank is a scoundrel, probably

My friend Ted Bunn wrote an article, "Who knows what evil lurks in the hearts of men? The Bayesian doesn’t care," inspired in part by one of my posts, "Repeated tests: how bad can it be?"

He presents this scenario:

Frank and Betsy are wondering whether a particular coin is a fair coin (i.e., comes up heads and tails equally often when flipped).  Frank, being a go-getter type, offers to do some tests to find out. He takes the coin away, flips it a bunch of times, and eventually comes back to Betsy to report his results. 
“I flipped the coin 3022 times,” he says, “and it came up heads 1583 times. That’s 72 more heads than you’d expect with a fair coin. I worked out the p-value — that is, the probability of this large an excess occurring if the coin is fair — and it’s under 1%. So we can conclude that the coin is unfair at a significance level of  1% (or ’99% confidence’ as physicists often say).”
And suggests that there are two ways Betsy can interpret this report (again, quoting Ted):

  1. Frank is an honest man, who has followed completely orthodox (frequentist) statistical procedure. To be specific, he decided on the exact protocol for his test (including, for some reason, the decision to do 3022 trials) in advance.
  2. Frank is a scoundrel who, for some reason, wants to reach the conclusion that the coin is unfair. He comes up with a nefarious plan: he keeps flipping the coin for as long as it takes to reach that 1% significance threshold, and then he stops and reports his results.

Finally, Ted asks, "What should Betsy conclude on the basis of the information Frank has given her?"  If you want to know Ted's answer, you have to read his article (and you should -- it is very interesting).

But I'm going to give you my answer: Frank is a scoundrel.  Well, probably.

I'll follow Ted by making one more assumption: "Suppose that Betsy’s initial belief is that 95% of coins are fair — that is, the probability P that they come up heads is exactly 0.5."  Now we can evaluate the evidence that Frank is a scoundrel.

If Frank is a scoundrel, the probability that he reports a positive result is 1, provided that he is willing to keep flipping long enough, or 1-x, for small x, if we put a bound on the number of flips he is willing to do.  So

P(positive test | Frank is a scoundrel) = 1-x

If Frank is honest, then the probability of a positive result is

P(fair coin) P(false positive | fair coin) + P(biased coin) * P(true positive | biased coin)

Betsy believes that P(fair coin) is 95% and P(biased coin) is 5%.  Since Frank's significance level is 1%, P(false positive | fair coin) is 1%.

The probability of a true positive is the power of the test, which depends on how biased the coin actually is.  But I will make the approximation that 3022 flips is enough to detect any substantial bias, so I'll take P(true positive | biased coin) = 1-y, for small y.  So,

P(positive test | Frank is honest) = (0.95)(0.01) + (0.05)(1-y)

As x and y get small, the likelihood ratio is (1 / 0.0595), which is about 17.  So that is fairly strong evidence that Frank is a scoundrel.

I don't know about Betsy's prior beliefs about Frank, so you will have to fill in your own punchline about her posterior.


  1. Second to last paragraph: 1 / 0.595 should be 1 / 0.0595

  2. Just a reminder...the probability of 0.01 in your calculation would only be valid if the significance level of 0.01 had been announced by Frank in advance. You cannot promote a p-value to an alpha level after it has been observed.

    Berger and Delampady have a paper on this. See also Jim Berger's website:

    1. @Bill: that 0.01 is conditioned on "Frank is honest" which is defined as "followed completely orthodox (frequentist) statistical procedure". So yes, Frank announced it in advance.

      One of the points Ted makes in his related argument is that, in the Bayesian analysis of this experiment, the data is all that matters and the state of mind of the experimenter is irrelevant. So you can avoid all of that nonsense about announcing the significance level and the sample size ahead of time.

    2. Thanks, Allen, that helps a lot.

      Mostly I wanted to point out that observed p-values cannot be interpreted as Type I error rates. As you know, there is a huge amount of confusion about this.

      As you note, Bayesians don't care about the state of mind of the experimenter. The only thing that counts is the data actually observed. That is one of the strengths of the Bayesian approach.

  3. For a similar bit of statistical skullduggery, see "How to Lie with a Simulation":