## Tuesday, October 25, 2016

### Socks, skeets, space aliens

In my Bayesian statistics class this semester, I asked students to invent new Bayes theorem problems, with the following criteria:

1) A good Bayes's theorem problem should pose an interesting question that seems hard to solve directly, but

2) It should be easier to solve with Bayes's theorem than without it, and

3) It should have some element of surprise, or at least a non-obvious outcome.

Several years ago I posted some of my favorites in this article.  Last week I posted a problem one of my students posed (Why is My Cat Orange?).  This week I have another student-written problem and two related problems that I wrote.  I'll post solutions later in the week.

### The sock drawer problem

Posed by Yuzhong Huang:

There are two drawers of socks. The first drawer has 40 white socks and 10 black socks; the second drawer has 20 white socks and 30 black socks.  We randomly get 2 socks from a drawer, and it turns out to be a pair (same color) but we don't know the color of these socks. What is the chance that we picked the first drawer?

[For this one, you can compute an approximate solution assuming socks are selected with replacement, or an exact solution assuming, more realistically, that they are selected without replacement.]

### The Alien Blaster problem

In preparation for an alien invasion, the Earth Defense League has been working on new missiles to shoot down space invaders.  Of course, some missile designs are better than others; let's assume that each design has some probability of hitting an alien ship, x.

Based on previous tests, the distribution of x in the population of designs is roughly uniform between 10% and 40%.  To approximate this distribution, we'll assume that x is either 10%, 20%, 30%, or 40% with equal probability.

Now suppose the new ultra-secret Alien Blaster 10K is being tested.  In a press conference, an EDF general reports that the new design has been tested twice, taking two shots during each test.  The results of the test are confidential, so the general won't say how many targets were hit, but they report: ``The same number of targets were hit in the two tests, so we have reason to think this new design is consistent.''

Is this data good or bad; that is, does it increase or decrease your estimate of x for the Alien Blaster 10K?

### The Skeet Shooting problem

At the 2016 Summer Olympics in the Women's Skeet event, Kim Rhode faced Wei Meng in the bronze medal match.  After 25 shots, they were tied, sending the match into sudden death.  In each round of sudden death, each competitor shoots at two targets.  In the first three rounds, Rhode and Wei hit the same number of targets.  Finally in the fourth round, Rhode hit more targets, so she won the bronze medal, making her the first Summer Olympian to win an individual medal at six consecutive summer games.  Based on this information, should we infer that Rhode and Wei had an unusually good or bad day?

As background information, you can assume that anyone in the Olympic final has about the same probability of hitting 13, 14, 15, or 16 out of 25 targets.

### Solutions

For the sock problem, we have to compute the likelihood of the data (getting a pair) under each hypothesis.  If it's Drawer 1, with 40 white socks and 10 black, the probability of getting a pair is approximately

(4/5)² + (1/5)²

If it's drawer 2, with 20 white and 30 black socks, the probability of a pair is:

(2/5)² + (3/5)²

In both cases I am pretending that we replace the first sock (and stir) before choosing the second, so the result is only approximate, but it is pretty close.  I'll leave the exact solution as an exercise :)

Now we can fill in the Bayesian update worksheet:

The likelihood of getting a pair is higher in Drawer 1, which is 40:10, than in Drawer 2, which is 30:20.

In general, the probability of getting a pair is highest if the drawer contains only one color sock, and lowest if the proportion if 50:50.  So getting a pair is evidence that the drawer is more likely to have a high (or low) proportion of one color, and less likely to be balanced.

We can write a more general solution using Jupyter notebook.
I'll represent the sock drawers with `Hist` objects, defined in the `thinkbayes2` library:
In [2]:
```drawer1 = Hist(dict(W=40, B=10), label='Drawer 1')
drawer2 = Hist(dict(W=20, B=30), label='Drawer 2')
drawer1.Print()
```
```B 10
W 40
```
Now I can make a `Pmf` that represents the two hypotheses:
In [3]:
```pmf = Pmf([drawer1, drawer2])
pmf.Print()
```
```Drawer 2 0.5
Drawer 1 0.5
```
This function computes the likelihood of the data for a given hypothesis:
In [4]:
```def Likelihood(data, hypo):
"""Likelihood of the data under the hypothesis.

data: string 'same' or 'different'
hypo: Hist object with the number of each color

returns: float likelihood
"""
probs = Pmf(hypo)
prob_same = probs['W']**2 + probs['B']**2
if data == 'same':
return prob_same
else:
return 1-prob_same
```
Now we can update `pmf` with these likelihoods
In [5]:
```data = 'same'

pmf[drawer1] *= Likelihood(data, drawer1)
pmf[drawer2] *= Likelihood(data, drawer2)
pmf.Normalize()
```
Out[5]:
`0.6000000000000001`
The return value from Normalize is the total probability of the data, the denominator of Bayes's theorem, also known as the normalizing constant.
And here's the posterior distribution:
In [6]:
```pmf.Print()
```
```Drawer 2 0.433333333333
Drawer 1 0.566666666667
```
The result is the same as what we got by hand.
Solutions to the other two problems coming soon.