## Friday, September 2, 2016

### Sleeping Beauty and the Red Dice

In response to my previous article on the Sleeping Beauty Problem, I got this comment from a reader:
The late great philosopher David Lewis was a halfer. I'd be interested in any reactions to his paper on it: http://fitelson.org/probability/lewis_sb.pdf
The context of the paper is a disagreement between Lewis and Adam Elga; specifically, Lewis's paper is a response to Elga's paper "Self-locating belief and the Sleeping Beauty Problem".

Elga presents the Sleeping Beauty problem like this:
Some researchers are going to put you to sleep. During the two days that your sleep will last, they will briefly wake you up either once or twice, depending on the toss of a fair coin (Heads: once; Tails: twice). After each waking, they will put you to back to sleep with a drug that makes you forget that waking. [Just after you are] awakened, to what degree ought you believe that the outcome of the coin toss is Heads?
And then he states the two most common responses to the problem
First answer: 1/2, of course! Initially you were certain that the coin was fair, and so initially your credence in the coin’s landing Heads was 1/2. Upon being awakened, you receive no new information (you knew all along that you would be awakened). So your credence in the coin’s landing Heads ought to remain 1/2.
Second answer: 1/3, of course! Imagine the experiment repeated many times. Then in the long run, about 1/3 of the wakings would be Heads-wakings — wakings that happen on trials in which the coin lands Heads. So on any particular waking, you should have credence 1/3 that that waking is a Heads-waking, and hence have credence 1/3 in the coin’s landing Heads on that trial. This consideration remains in force in the present circumstance, in which the experiment is performed just once.
In his Section 2, Elga then proves that the correct answer is 1/3.  His proof is correct (although there are a few spots where it would be helpful to fill in some intermediate steps).  So Lewis is wrong to reject this proof.

But Elga's Section 3 introduces some confusion around the meaning of "information".  Elga says:
Let H be the proposition that the outcome of the coin toss is Heads. Before being put
to sleep, your credence in H was 1/2. I’ve just argued that when you are awakened
on Monday, that credence ought to change to 1/3. This belief change is unusual. It is
not the result of your receiving new information — you were already certain that you
would be awakened on Monday.
And then in a footnote:
To say that an agent receives new information (as I shall use that expression) is to say that the agent receives evidence that rules out possible worlds not already ruled out by her previous evidence.
This is where Elga and I disagree.  I would say that an agent receives information if they receive evidence that is not equally likely in all possible worlds.  In that case, the evidence should cause the agent to change their credences (subjective beliefs) about at least some possible worlds.

In particular (as I explained in my previous article), when Sleeping Beauty is awakened, she observes an event, awakening, that is twice as likely under T (the proposition that the coin toss is Heads) than under H, and she should change her credences accordingly.

So in my solution, her belief change is not unusual; it is an application of Bayes's Theorem that is only remarkable because it is not immediately obvious what the evidence is and what its likelihood is under the two hypotheses.  In that sense, it is similar to the Elvis Problem.

In the rest of Section 3, Elga tries to reconcile the seemingly contradictory conclusions that Beauty receives no new information and Beauty should change her credences.  I think this argument addresses a non-problem, because Beauty does receive information that justifies her change in credences.  So I agree with Lewis that Elga is wrong to conclude that the Sleeping Beauty problem raises, "a new question about how a rational agent ought to update her beliefs over time".

In summary:

1) Lewis is wrong about the answer to the problem and wrong to reject Elga's proof,

2) Also, his claim that Beauty does not receive information is wrong.

3) However, he is right to reject the argument in Elga's Section 3.

The Red Dice

At this point, we have three arguments to support the "thirder" position:

1) The argument based on long-run frequencies (I quoted Elga's version above).

2) The argument based on the principle of indifference (Elga's section 2).

3) The argument based on Bayes's theorem (in my previous article).

But if you still find it hard to believe that Beauty gets information when she wakes up, the Red Dice problem might help.  I wrote about several versions of it in this previous article:
Suppose I have a six-sided die that is mostly red -- that is, red on 4 sides and blue on 2 -- and another that is mostly blue -- that is, blue on 4 sides and red on 2.
I choose a die at random (with equal probability) and roll it.  If it comes up red, I tell you "it came up red".  Otherwise, I put the die back, choose again, and roll again.  I repeat until the outcome is red.
If I follow this procedure and eventually report that the die came up red, what is the probability that the last die I rolled is mostly red?
A halfer might claim (incorrectly) that you have received no relevant information about the die because the outcome was inevitable, eventually.  The evidence you receive when I tell you the outcome is red is identical regardless of which die it was, so it should not change your credences.

A thirder would respond (correctly) that the outcome you observed is twice as likely if the die is mostly red, and therefore it provides evidence in favor of the hypothesis that it is mostly red.  Specifically, the posterior probability is 2/3.

If you don't believe this answer, you can see a more careful explanation and a demonstration by simulation in this Jupyter notebook (see Scenario C).

The Red Dice problem suggests that we should be skeptical of an argument with the form "The observation was inevitable under all hypotheses, and therefore we received no information."  If an event happens once under H and twice under T, it is inevitable under both; nevertheless, a random observation of the event is twice as likely under T, and therefore provides evidence in favor of T.

1. But by the very definition of the problem, P(waking) = 1. There doesn't exist a scenario in which she will not wake up. Her being woken is not affected by any other factor in the problem: P(waking|?) = 1.

Thus, P(A|waking) = P(waking|A) × P(A) / P(waking) = P(A). That's what Bayes's theorem tells us.

1. Hi Brian. We have to be careful when we talk about things with probability 1 -- that's the lesson of the Red Dice example. The probability is 1 that I will eventually report "red", but when I report red, that event is more likely if I rolled the mostly-red die.

Similarly, the probability is 1 that SB will wake up; nevertheless, a given waking event is more likely under T than under H.

If you use the odds form of Bayes's theorem, the prior odds are 1. The likelihood ratio is 2:1 in favor of T. So the posterior odds are 2:1 in favor of T.

2. Thanks Allen. Just for the record, would you want to highlight specifically where Lewis goes wrong in *his* paper?

1. On the third page:

"(6) Beauty gains no new uncentred evidence, relevant to HEADS versus TAILS, between the time when she has credence function P- and the time when she has credence function P. The only evidence she gains is the centred evidence that she is presently undergoing either the Monday awakening or the Tuesday awakening: that is, (H1 or T1 or T2)."

That's the point I disagree with.

3. The funny thing about the Sleeping Beauty Problem is that Halfers and Thirders alike will always come to the same answer in decision problems, e.g. to win a bet or survive an experiment with poisoned jellybeans.

This suggests that an argument over the numerical value of "credence" might be little more than one of presentation: how does one assign a number to your belief, in the broadest sense, that the coin came up Heads? Which long-run-frequency of Heads -- w.r.t. experiments or w.r.t. wakeups -- do you feel like quoting?

But a bare number, like a probability, may be insufficient to summarise the knowledge required to make effective decisions. There are games where the optimal strategy depends on the coin outcome and this strategy can be computed from one (Halfer or Thirder) but not the other credence when, having written down her credence, Beauty's knowledge of the game's parameters is deleted.

When you ask a Halfer or Thirder for their credence, they're committing to something not much more substantial than a linguistic choice. But when the issue is monetised as a decision problem, they'll bet the same way.

1. There are some decisions Halfers and Thirders would agree on, but not all. For example, suppose the coin is tossed Sunday night, and upon every awakening, Beauty is offered a choice of \$3 if Heads (0 otherwise) or \$2 if Tails (0 otherwise), with the payoff to be made immediately. A Halfer should prefer the first, with expected value 3/2, over the second, with expected value 2/2. A Thirder would choose the second, with expected value 4/3, over the first, with expected value 3/3. And because the Thirder's decision is based on the correct probability, it would be the correct decision, in the sense of maximizing the expected payoff.

If a Halfer bets otherwise, they are just compounding an incorrect calculation with an inconsistent decision.

2. Allen, the Halfer would reason that it's as likely to be an experiment paying \$3 once as \$2 twice, so will go for the latter. There is nothing incorrect about saying the long-run frequency of Heads w.r.t. experiments is one half, nor inconsistency in deciding to vote Tails in the example you just gave.

I'll repeat: you can't separate Halfers and Thirders in practical decision problems.

3. I think you and I have a different understanding of the Halfer position. I think it's an ordinary proposition about an ordinary probability, and because it's wrong, someone who uses it to make decisions will make bad ones.

It sounds like you interpret the Halfer position as a belief that has no consequences. If so, it's not very interesting.

I don't think that's what Halfers mean, but maybe it's not worth debating what Halfers mean.

4. Allen, the Thirder position can be equally inconsequential: just ask a Thirder how her credence for Heads leads Beauty to pick the correct jar of beans (http://allendowney.blogspot.co.uk/2015/06/the-sleeping-beauty-problem.html?showComment=1446826222182#c466456992572929693).

Imagine we used slightly different language for the problem. The interviewer asks Beauty, "what do you believe about the state of the coin?". Halfer-Beauty answers, "with respect to experiments, a half-chance of Heads". Thirder-Beauty answers, "with respect to wake-ups, a third-chance of Heads". Both are correct, and both lead to exactly the same decisions.

That's what I reckon is really happening here: a mostly-unconscious tendency to express the coin flip w.r.t. experiments or w.r.t. wakeups. And the fact that contributors to threads like this cannot be separated in decision problems is some evidence in favour of that interpretation.

5. I agree with Creosote. Halfers and thirders can not be separated by any decision problem - if they could, the puzzle would indeed be trivial.

Both halfers and thirders would bet on 1/3 if asked to make one bet per wake-up, and on 1/2 if asked to make one bet per experiment (although their argumentation might be slightly different.)

Further, if the experiment is modified so that we have different people (or Sleeping Beauty-clones) waking up at the possible wake-up events, everyone agrees on the 1/3 answer.

The halfer position is an ordinary proposition about an ordinary probability. It is consistent with the facts and leads to correct decisions.

4. What creates controversy in this problem, is how to treat the fact the SB may be awake on a different day. Specifically, is that a different event (Elga), or the same event (Lewis). But there is a simple way to eliminate that issue by addressing a different, but equivalent, question.

Four women volunteer to participate in an experiment that uses all but one of the procedures in the original Sleeping Beauty Problem.. One coin is flipped on Sunday night after all four are put to sleep. All four will be wakened (in separate rooms) once, or twice, depending on the result of that coin flip:

SB1 will be wakened on Monday, and also on Tuesday if the result is Tails.

SB2 will be wakened on Monday, and also on Tuesday if the result is Heads.

SB3 will be wakened on Tuesday, and also on Monday if the result is Tails.

SB4 will be wakened on Tuesday, and also on Monday if the result is Heads.

This way, exactly three of the women will be awakened on each day of the experiment. Each will be asked the question "What is your credence, now, for the proposition that today is the only day you will be wakened?"

It is trivial to see that SB1's experiment is identical to the original Sleeping Beauty Problem. It is also trivial to see that SB2, SB3, and SB4 are in an experiment that is functionally equivalent, varying only it the specifics of the schedule.

When SB1 finds herself awake, she knows that exactly two other volunteers are awake. She also knows that the proposition "today is the only day you will be wakened" is true for exactly one of these three, and that the Principle of Indifference applies to which it is.

The only possible answer to this version of the problem is 1/3. The answer to the original Sleeping Beauty Problem has to be 1/3, because SB1's is problem is the same problem.

1. Thanks for the positive response.

IMO, this controversy stresses how the term "new information" gets tossed about without a definition. Specifically, halfers use it as the cornerstone of their solution, without defining it.

Here's what I think it means: First, define a non-trivial partition X to be a set of (1) disjoint events that (2) span the sample space and (3) all have non-zero probability. If an information event I occurs such that the union of I with each member of X is no longer a non-trivial partition, then I is "new information" that requires probabilities to be updated by dividing by the new sum.

Usually, this situation comes about when the information tells us that certain events are no longer possible. If tell you that a die-roll was even, then Pr(1∩I)=Pr(3∩I)=Pr(5∩I)=0, and these events can't be in a non-trivial partition. The updated probabilities are found by dividing Pr(2∩I)=Pr(4∩I)=Pr(6∩I)=1/6 by their sum, 1/2.

In the Sleeping Beauty Problem on Sunday, {Coin=H∩Wake=Mon, Coin=T∩Wake=Mon} suffices to make a non-trivial partition. Each event has probability 1/2, and they are disjoint events. Coin=T∩Wake=Mon and Coin=T∩Wake=Tue are not disjoint because they represent the same future/world/whatever. But when SB is wakened, Coin=T∩Wake=Mon and Coin=T∩Wake=Tue become disjoint, so an update is required.

5. Regarding the red dice problem, a halfer would claim (correctly I think) that once the experimental procedure has been explained, one should believe there is a 2/3 chance the last die is mostly red.

When you then say that the actual outcome is red (an event which happens with probability one) that provides no new information, and the belief remains the same.

6. The issue with Lewis' argument, and Elga's section 3, is that "new information" is not defined. Or more accurately, what allows a Bayesian update, usually called "new information" without definition, is never described in a robust manner.

IMO, "new" is the wrong concept for this; "changed" is better. It's just that in the vast majority of examples, "changed" means "has something added," which is "new."

A partition of a discrete sample space is a set of disjoint events whose probabilities sum to 1. So if you roll a die, {1,2,3,4,5,6}, {even,odd}, and {prime, non-prime} are all partitions. But {1,2,3} is not because the probabilities do not sum to 1, and {4,5,6,odd} is not because the events are not disjoint.

Events that have zero probability can be included in a sample space, so {1,2,3,4,5,6,7} is also a partition. To express the idea I want, I need to introduce a "minimal partition." That's one that includes no zero-probability events.

What allows an update, is any information state change that alters what is, or is not, a minimal partition. On Sunday, {Heads&Monday, Tails&Monday, Heads&Tuesday, Tails&Tuesday} is a minimal partition for potential experiment states in the future. Each - including Heads&Tuesday - has a 1/4 probability to represent the experiment's state at any moment in the next two days. But when SB is wakened, Heads&Tuesday is disqualified as a possible game state. This creates a change in the minimal partition, and so allows for an update.

He said it differently, and I don't agree with how he said it, but I think this is what Elga meant in his Section 3. The idea is correct.

To update a probability based on changed information, you divide the previous probability of the event in question by the sum of previous probabilities of the new minimal partition:

7. P(heads|awake) has several possible values. It is not unique unless assumptions are made that are not explicitly stated in the problem.

A rational Sleeping Beauty can compute the fair price X for the bet "Bet owner gets \$1 if the coin landed heads and must pay the purchase price twice if the coin landed tails". (Sleeping beauty must set one price X for betting on heads that applies on both Monday and Tuesday because she cannot distinguish one day from the other) The answer for the "fair price" is X = 1/3 and this is calculated independently of the controversial P(heads| awake). It only requires using the unconditional probability P(heads) = 1/2.

The answer from the betting strategy computation does not have the interpretation of being Sleeping Beauty's estimate of P(heads|awake). She doesn't use any estimate of P(heads|awake) to calculate X = 1/3. And she is not making the "pure" bet "Bet owner gets \$1 if the coin landed heads". The actual bet she is offered has more consequences.

There are various plausible assumptions that lead to the "thirder" answer. They involve somehow connecting the expected frequencies for the events (heads, Monday, awake), (tails, Monday, awake), (tails, Tails, Tuesday, awake) as generated by the experiment to the probabilities that those situations are the single situation that happens "when Sleeping Beauty awakes". However, there is no information in the problem that explicitly makes such a connection.

The "halfer" answer for P(Heads | awake) is not unique, but it satisfies the given information. It does not contradict that the best betting strategy is X = 1/3 because a rational "halfer" would calculate X in the same manner as indicated above.

A specific model for the probability distribution of the situation "when Sleeping Beauty is awakened" that is consistent with the "halfer" answer is:
1) Toss the coin and run the experiment. 2) From the sitation(s) that arise the experiment, pick a situation, giving each situation the same probability of being selected if we must pick from among two.

That model produces conditional probabilities that offend somes people's intuition, but it does not mathematically contradict any information given in the problem. The "halfer" answer in not the unique answer for P(heads | awake) because there is also a "thirder" probability model that is consistent with the information given in the problem.

It's tempting to think that the Sleeping Beauty problem is equivalent to a typical balls-in-urns problem. For example, Urn H contains 1 amber-colored ball and one sienna-colored ball. Urn T contains 2 amber colored balls. A fair coin is flipped. Urn H is chosen if the coin lands heads, otherwise urn T is chosen. A ball is drawn at random from the chosen urn.

Question 1) Given the ball that is drawn is amber-colored, what is the probability that Urn H was chosen?

However that does not exemplify the scenario in the Sleeping Beauty problem. By analogy to the Sleeping Beauty problem , all the balls are drawn from the selected urn. Then the ill-posed question is asked:

Question 2: Upon observing a draw where the ball is amber colored, what is the probability that Urn H was selected.

Question 2 does not say that each of the two balls that are drawn have the same probability of being the one observed. It doesn't specify that the draw observed is the first draw or the second draw. It doesn't rule out that the observer has some bias -like always choosing to observe only the first draw from urn T.

Since P(heads | awake) has no objectively calculable value, one may resort to making assumptions by invoking The Principle of indifference. If we believe that the Principle of Indifference cannot be paradoxical then all solutions computed that way should be the same.