Thursday, June 25, 2015

Bayesian Billiards

I recently watched a video of Jake VanderPlas explaining Bayesian statistics at SciPy 2014.  It's an excellent talk that includes some nice examples.  In fact, it looks like he had even more examples he didn't have time to present.

Although his presentation is very good, he includes one statement that is a little misleading, in my opinion.  Specifically, he shows an example where frequentist and Bayesian methods yield similar results, and concludes, "for very simple problems, frequentist and Bayesian results are often practically indistinguishable."

But as I argued in my recent talk, "Learning to Love Bayesian Statistics," frequentist methods generate point estimates and confidence intervals, whereas Bayesian methods produce posterior distributions.  That's two different kinds of things (different types, in programmer-speak) so they can't be equivalent.

If you reduce the Bayesian posterior to a point estimate and an interval, you can compare Bayesian and frequentist results, but in doing so you discard useful information and lose what I think is the most important advantage of Bayesian methods: the ability to use posterior distributions as inputs to other analyses and decision-making processes.

The next section of Jake's talk demonstrates this point nicely.  He presents the "Bayesian Billiards Problem", which Bayes wrote about in 1763.  Jake presents a version of the problem from this paper by Sean Eddy, which I'll quote:
"Alice and Bob are playing a game in which the first person to get 6 points wins. The way each point is decided is a little strange. The Casino has a pool table that Alice and Bob can't see. Before the game begins, the Casino rolls an initial ball onto the table, which comes to rest at a completely random position, which the Casino marks. Then, each point is decided by the Casino rolling another ball onto the table randomly. If it comes to rest to the left of the initial mark, Alice wins the point; to the right of the mark, Bob wins the point. The Casino reveals nothing to Alice and Bob except who won each point. 
"Clearly, the probability that Alice wins a point is the fraction of the table to the left of the mark—call this probability p; and Bob's probability of winning a point is 1 - p. Because the Casino rolled the initial ball to a random position, before any points were decided every value of p was equally probable. The mark is only set once per game, so p is the same for every point. 
"Imagine Alice is already winning 5 points to 3, and now she bets Bob that she's going to win. What are fair betting odds for Alice to offer Bob? That is, what is the expected probability that Alice will win?"
Eddy solves the problem using a Beta distribution to estimate p, then integrates over the posterior distribution of p to get the probability that Bob wins.  If you like that approach, you can read Eddy's paper.  If you prefer a computational approach, read on!  My solution is in this Python file.

The problem statement indicates that the prior distribution of p is uniform from 0 to 1.  Given a hypothetical value of p and the observed number of wins and losses, we can compute the likelihood of the data under each hypothesis:

  class Billiards(thinkbayes.Suite):

    def Likelihood(self, data, hypo):
        """Likelihood of the data under the hypothesis.

        data: tuple (#wins, #losses)
        hypo: float probability of win
        p = hypo
        win, lose = data
        like = p**win * (1-p)**lose
        return like

Billiards inherits the Update function from Suite (which is defined in and explained in Think Bayes) and provides Likelihood, which uses the binomial formula:

\textstyle {n \choose k}\, p^k (1-p)^{n-k}

I left out the first term, the binomial coefficient, because it doesn't depend on p, so it would just get normalized away.

Now I just have to create the prior and update it:

    ps = numpy.linspace(0, 1, 101)
    bill = Billiards(ps)
    bill.Update((5, 3))

The following figure shows the resulting posterior:

Now to compute the probability that Bob wins the match.  Since Alice is ahead 5 points to 3, Bob needs to win the next three points.  His chance of winning each point is (1-p), so the chance of winning the next three is (1-p)³.

We don't know the value of p, but we have its posterior distribution, which we can "integrate over" like this:

def ProbWinMatch(pmf):
    total = 0
    for p, prob in pmf.Items():
        total += prob * (1-p)**3
    return total

The result is = 0.091, which corresponds to 10:1 odds against.

Using a frequentist approach, we get a substantially different answer.  Instead of a posterior distribution, we get a single point estimate.  Assuming we use the MLE, we estimate p = 5/8, and (1-p)³ = 0.056, which corresponds to 18:1 odds against.

Needless to say, the Bayesian result is right and the frequentist result is wrong.

But let's consider why the frequentist result is wrong.  The problem is not the estimate itself.  In fact, in this example, the Bayesian maximum aposteriori probability (MAP) is the same as the frequentist MLE.  The difference is that Bayesian posterior contains all of the information we have about p, whereas the frequentist result discards a large part of that information.

The result we are interested in, the probability of winning the match, is a non-linear transform of p, and in general for a non-linear transform f, the expectation E[f(p)] does not equal f(E[p]).  The Bayesian method computes the first, which is right; the frequentist method approximates the second, which is wrong.

To summarize, Bayesian methods are better not just because the results are correct, but more importantly because the results are in a form, the posterior distribution, that lends itself to answering questions and guiding decision-making under uncertainty.

[UPDATE June 26, 2015]  Some readers have objected that this article is unfair, because any competent statistician would know that the frequentist method I presented here would behave badly, and even a classical statistician could happily use Bayes's theorem to solve this problem, because the prior is provided by the problem statement.

A few thoughts in response:

1) This article compares a Bayesian method and frequentist method.  I never said anything about Bayesians and frequentists as people, or about what methods a hypothetical statistician might choose.

2) The problem with the frequentist approach to this problem is that it produces a point estimate rather than a posterior distribution.  Any method that produces a point estimate is going to have the same problem.

3) Maybe it's true that a competent statistician would expect the frequentist approach to perform badly for this problem, but it still seems like that is a disadvantage of the frequentist approach.  I would rather have a method that is always right than a method that sometimes works and sometimes fails, and requires substantial expertise to know when to expect which.

Friday, June 12, 2015

The Sleeping Beauty Problem

How did I get to my ripe old age without hearing about The Sleeping Beauty Problem?  I've taught and written about The Monty Hall Problem and The Girl Named Florida Problem.  But I didn't hear about Sleeping Beauty until this reddit post, pointing to this video:

Of course, there's a Wikipedia page about it, which I'll borrow to provide the background:
"Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice during the experiment, Beauty will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her forget that awakening. 
A fair coin will be tossed to determine which experimental procedure to undertake: if the coin comes up heads, Beauty will be awakened and interviewed on Monday only. If the coin comes up tails, she will be awakened and interviewed on Monday and Tuesday. In either case, she will be awakened on Wednesday without interview and the experiment ends. 
Any time Sleeping Beauty is awakened and interviewed, she is asked, 'What is your belief now for the proposition that the coin landed heads?'"
The problem is discussed at length on this CrossValidated thread.  As the person who posted the question explains, there are two common reactions to this problem:
The Halfer position. Simple! The coin is fair--and SB knows it--so she should believe there's a one-half chance of heads. 
The Thirder position. Were this experiment to be repeated many times, then the coin will be heads only one third of the time SB is awakened. Her probability for heads will be one third.
The thirder position is correct, and I think the argument based on long-run averages is the most persuasive.  From Wikipedia:
Suppose this experiment were repeated 1,000 times. It is expected that there would be 500 heads and 500 tails. So Beauty would be awoken 500 times after heads on Monday, 500 times after tails on Monday, and 500 times after tails on Tuesday. In other words, only in one-third of the cases would heads precede her awakening. This long-run expectation should give the same expectations for the one trial, so P(Heads) = 1/3.
But here's the difficulty (from CrossValidated):

Most, but not all, people who have written about this are thirders. But:
  • On Sunday evening, just before SB falls asleep, she must believe the chance of heads is one-half: that’s what it means to be a fair coin.
  • Whenever SB awakens, she has learned absolutely nothing she did not know Sunday nightWhat rational argument can she give, then, for stating that her belief in heads is now one-third and not one-half?
As a stark raving Bayesian, I find this mildly disturbing.  Is this an example where frequentism gets it right and Bayesianism gets it wrong?  One of the responses on reddit pursues the same thought:

I wonder where exactly in Bayes' rule does the formula "fail". It seems like P(wake|H) = P(wake|T) = 1, and P(H) = P(T) = 1/2, leading to the P(H|wake) = 1/2 conclusion.
Is it possible to get 1/3 using Bayes' rule?
I have come to a resolution of this problem that works, I think, but it made me realize the following subtle point: even if two things are inevitable, that doesn't make them equally likely.

In the previous calculation, the priors are correct: P(H) = P(T) = 1/2

It's the likelihoods that are wrong.  The datum is "SB wakes up".  This event happens once if the coin is heads and twice if it is tails, so the likelihood ratio P(wake|H) / P(wake|T) = 1/2

If you plug that into Bayes's theorem, you get the correct answer, 1/3.

This is an example where the odds form of Bayes's theorem is less error prone: the prior odds are 1:1.  The likelihood ratio is 1:2, so the posterior odds are 1:2.  By thinking in terms of likelihood ratio, rather than conditional probability, we avoid the pitfall.

If this example is still making your head hurt, here's an analogy that might help: suppose you live near a train station, and every morning you hear one express train and two local trains go past.  The probability of hearing an express train is 1, and the probability of hearing a local train is 1.  Nevertheless, the likelihood ratio is 1:2, and if you hear a train, the probability is only 1/3 that it is the express.

[UPDATE 11 November 2015] Peter Norvig writes about the Sleeping Beauty problem in this IPython notebook.  He agrees that the correct answer is 1/3:
The "halfers" argue that before Sleeping Beauty goes to sleep, her unconditional probability for heads should be 1/2. When she is interviewed, she doesn't know anything more than before she went to sleep, so nothing has changed, so the probability of heads should still be 1/2. I find two flaws with this argument. First, if you want to convince me, show me a sample space; don't just make philosophical arguments. (Although a philosophical argument can be employed to help you define the right sample space.) Second, while I agree that before she goes to sleep, Beauty's unconditional probability for heads should be 1/2, I would say that both before she goes to sleep and when she is awakened, her conditional probability of heads given that she is being interviewed should be 1/3, as shown by the sample space.

[UPDATE June 15, 2015]  In the comments below, you’ll see an exchange between me and a reader named James.  It took me a few tries to understand his question, so I’ll take the liberty of editing the conversation to make it clearer (and to make me seem a little quicker on the uptake):

James: I'd be interested in your reaction to the following extension. Before going to sleep on Sunday, Sleeping Beauty makes a bet at odds of 3:2 that the coin will come down heads. (This is favourable for her when the probability of heads is 1/2, and unfavourable when the probability of heads is 1/3). She is told that whenever she is woken up, she will be offered the opportunity to cancel any outstanding bets. Later she finds herself woken up, and asked whether she wants to cancel any outstanding bets. Should she say yes or no? (Let's say she doesn't have access to any external randomness to help her choose). Is her best answer compatible with a "belief of 1/3 that the coin is showing heads"?

Allen: If the bet is only resolved once (on Wednesday), then SB should accept the bet (and not cancel it) because she is effectively betting on a coin toss with favorable odds, and the whole sleeping-waking scenario is irrelevant.

James: Right, the bet is only resolved once.  So, we agree that she should not cancel. But isn't there something odd? Put yourself in SB's position when you are woken up. You say that you have a "belief of 1/3 in the proposition that the coin is heads". The bet is unfavourable to you if the probability of heads is 1/3. And yet you don't cancel it. That suggests one sense in which you do NOT have a belief of 1/3 after all.

Allen: Ah, now I see why this is such an interesting problem.  You are right that I seem to have SB keeping a bet that is inconsistent with her beliefs.  But SB is not obligated to bet based on her current beliefs. If she knows that more information is coming in the future, she can compute a posterior based on that future information and bet accordingly.

Each time she wakes up, she should believe that she is more likely to be in the Tails scenario -- that is, that P(H) = 1/3 -- but she also knows that more information is coming her way.

Specifically, she knows that when she wakes up on Wednesday, and is told that it is Wednesday and the experiment is over, she will update her beliefs and conclude that the probability of Heads is 50% and the bet is favorable.

So when she wakes up on Monday or Tuesday and has the option to cancel the bet, she could think: "Based on my current beliefs, this bet is unfavorable, but I know that before the bet is resolved I will get more information that makes the bet favorable. So I will take that future information into account now and keep the bet (decline to cancel)."

I think the weirdness here is not in her beliefs but in the unusual scenario where she knows that she will get more information in the future. The Bayesian formulation of the problem tells you what she should believe after performing each update, but [the rest of the sentence deleted because I don’t think it’s quite right any more].


Upon further reflection, I think there is a general rule here:

When you evaluate a bet, you should evaluate it relative to what you will believe when the bet is resolved, which is not necessarily what you believe now.  I’m going to call this the Fundamental Theorem of Betting, because it reminds me of Sklansky’s Fundamental Theorem of Poker, which says that the correct decision in a poker game is the decision you would make if all players’ cards were visible.

Under normal circumstances, we don’t know what we will believe in the future, so we almost always use our current beliefs as a heuristic for, or maybe estimate of, our future beliefs.  Sleeping Beauty’s situation is unusual because she knows that more information is coming in the future, and she knows what the information will be!

To see how this theorem holds up, let me run the SB scenario and see if we can make sense of Sleeping Beauty’s beliefs and betting strategy:

Experimenter: Ok, SB, it’s Sunday night.  After you go to sleep, we’re going to flip this fair coin.  What do you believe is the probability that it will come up heads, P(H)?

Sleeping Beauty:  I think P(H) is ½.

Ex: Ok.  In that case, I wonder if you would be interested in a wager.  If you bet on heads and win, I’ll pay 3:2, so if you bet $100, you will either win $150 or lose $100.  Since you think P(H) is ½, this bet is in your favor.  Do you want to accept it?

SB: Sure, why not?

Ex: Ok, on Wednesday I’ll tell you the outcome of the flip and we’ll settle the bet.  Good night.

SB:  Zzzzz.

Ex: Good morning!

SB: Hello.  Is it Wednesday yet?

Ex: No, it’s not Wednesday, but that’s all I can tell you.  At this point, what do you think is the probability that I flipped heads?

SB: Well, my prior was P(H) = ½.  I’ve just observed an event (D = waking up before Wednesday) that is twice as likely under the tails scenario, so I’ll update my beliefs and conclude that  P(H|D) = ⅓.

Ex: Interesting.  Well, if the probability of heads is only ⅓, the bet we made Sunday night is no longer in your favor.  Would you like to call it off?

SB: No, thanks.

Ex: But wait, doesn’t that mean that you are being inconsistent?  You believe that the probability of heads is ⅓, but you are betting as if it were ½.

SB: On the contrary, my betting is consistent with my beliefs.  The bet won’t be settled until Wednesday, so my current beliefs are not important.  What matters is what I will believe when the bet is settled.

Ex: I suppose that makes sense.  But do you mean to say that you know what you will believe on Wednesday?

SB: Normally I wouldn’t, but this scenario seems to be an unusual case.  Not only do I know that I will get more information tomorrow; I even know what it will be.

Ex: How’s that?

SB: When you give me the amnesia drug, I will forget about the update I just made and revert to my prior.  Then when I wake up on Wednesday, I will observe an event (E = waking up on Wednesday) that is equally likely under the heads and tails scenarios, so my posterior will equal my prior, I will believe that P(H|E) is ½, and I will conclude that the bet is in my favor.

Ex: So just before I tell you the outcome of the bet, you will believe that the probability of heads is ½?

SB: Right.

Ex: Well, if you know what information is coming in the future, why don’t you do the update now, and start believing that the probability of heads is ½?

SB: Well, I can compute P(H|E) now if you want.  It’s ½ -- always has been and always will be.  But that’s not what I should believe now, because I have only seen D, and not E yet.

Ex: So right now, do you think you are going to win the bet?

SB: Probably not.  If I’m losing, you’ll ask me that question twice.  But if I’m winning, you’ll only ask once.  So ⅔ of the time you ask that question, I’m losing.

Ex: So you think you are probably losing, but you still want to keep the bet?  That seems crazy.

SB: Maybe, but even so, my beliefs are based on the correct analysis of my situation, and my decision is consistent with my beliefs.

Ex: I’ll need to think about that.  Well, good night.

SB: Zzzzz.