Thursday, October 20, 2011

My favorite Bayes's Theorem problems

This week: some of my favorite problems involving Bayes's Theorem.  Next week: solutions.

1) The first one is a warm-up problem.  I got it from Wikipedia (but it's no longer there):
Suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of Bowl #1?
This is a thinly disguised urn problem.  It is simple enough to solve without Bayes's Theorem, but good for practice.

2) This one is also an urn problem, but a little trickier.
The blue M&M was introduced in 1995.  Before then, the color mix in a bag of plain M&Ms was (30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan).  Afterward it was (24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown). 
A friend of mine has two bags of M&Ms, and he tells me that one is from 1994 and one from 1996.  He won't tell me which is which, but he gives me one M&M from each bag.  One is yellow and one is green.  What is the probability that the yellow M&M came from the 1994 bag?
3) This one is from one of my favorite books, David MacKay's "Information Theory, Inference, and Learning Algorithms":
Elvis Presley had a twin brother who died at birth.  What is the probability that Elvis was an identical twin?
To answer this one, you need some background information: According to the Wikipedia article on twins:  ``Twins are estimated to be approximately 1.9% of the world population, with monozygotic twins making up 0.2% of the total---and 8% of all twins.''

4) Also from MacKay's book:
Two people have left traces of their own blood at the scene of a crime.  A suspect, Oliver, is tested and found to have type O blood.  The blood groups of the two traces are found to be of type O (a common type in the local population, having frequency 60%) and of type AB (a rare type, with frequency 1%).  Do these data (the blood types found at the scene) give evidence in favour [sic] of the proposition that Oliver was one of the two people whose blood was found at the scene?
5) I like this problem because it doesn't provide all of the information.  You have to figure out what information is needed and go find it.
According to the CDC, ``Compared to nonsmokers, men who smoke are about 23 times more likely to develop lung cancer and women who smoke are about 13 times more likely.''
If you learn that a woman has been diagnosed with lung cancer, and you know nothing else about her, what is the probability that she is a smoker?
6) And finally, a mandatory Monty Hall Problem.  First, here's the general description of the scenario, from Wikipedia:
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say Door A [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say Door B, which has a goat. He then says to you, "Do you want to pick Door C?" Is it to your advantage to switch your choice?
The answer depends on the behavior of the host if the car is behind Door A.  In this case the host can open either B or C.  Suppose he chooses B with probability p and C otherwise.  What is the probability that the car is behind Door A (as a function of p)?

If you like this problem, you might also like the Blinky Monty Problem.

Solutions next week!

Please let me know if you have suggestions for more problems. An ideal problem should meet at least some of these criteria:
1) It should be based on a context that is realistic or at least interesting, not too contrived. 
2) It should make good use of Bayes's Theorem -- that is, it should be easier to solve with BT than without.
3) It should involve some real data, which the solver might have to find.
4) It might involve a trick, but should not be artificially hard.

If you send me something that is not under copyright, or is usable under fair use, I will include it in the next edition of Think Stats and add you to the contributors list.


  1. Suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks TWO cookies at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. BOTH the cookies turns out to be a plain one. How probable is it that Fred picked them out of Bowl #1?
    I just wanted to know how to solve the problem of the cookies if in the above cookies problem, if two cookies were selected at a time , then what is the probability that both the cookies are from bowl#1??
    how to solve this, please tell me.

    1. Hi Nikhil, Good question. Maybe I will post an update with a longer answer, but here's the short version. You can extend the analysis shown above to handle this data by computing the likelihood of the data under each hypothesis. For example, the likelihood of drawing two vanilla cookies (without replacement) from a bowl with 20 vanilla and 20 chocolate cookies is (20)(19) / (40)(39), which is a little less than 1/4.

    2. hi nikhil,compute the probability of drawing 2 plain cookies from each bowl ie 30C2/40C2 and 20C2/40C2 for 1 and 2 respectively and then use this probabilities to solve using baye's theorem.

  2. Nice post! To answer the Elvis question, you actually still need some additional information (from Among fraternal twins, 1/3 are two girls, 1/3 are two boys, and 1/3 are a boy and a girl. Also (just to be perfectly thorough) identical twins are certain to have the same gender.

    1. I agree that we need more information, but I am pretty sure that answer from is wrong. Among fraternal twins, 25% are GG, 25% are BB, and 50% are BG or GB.

  3. A biometric security device using fingerprints erroneously refuses to admit 1 in 1,000 authorized persons from a facility containing classified information. The device will erroneously admit 1 in 1,000,000 unauthorized persons. Assume that 95 percent of those who seek access are authorized. If the alarm goes off and a person is refused admission, what is the probability that the person was really authorized?

    Please help me solve this problem as soon as possible .plss.....

    1. If you Google this problem, you'll find lots of online solutions. If you are taking a class and don't know how to solve this, ask questions until you do!

  4. Problem 2 should have 24% blue instead of 20%. The probabilities did not add up to 1 so I googled the problem to find out which one was off.

  5. I realize this is an older post, but it is great stuff; I was watching your video on YouTube that covers this material and stopped to work it out myself. On the m&m problem, there are actually two issues: one which you've outlined above, but the second could be stated as, "What are the chances that the dispenser of m&m's is not a good friend, given the early date of those bags of candy?!"

    Thanks again, your work is much appreciated.


  6. Bayes Theorm : In bayes theorm if I know the value of p(symptoms/disease) let say 0.3 so, is it justifiable if I take p(~symptoms/disease)=(1-p(symptoms/disease)) p(symptoms/~disease) = (1-p(symptoms/disease))?

    1. I don't understand the question. It doesn't look like you are using Bayes's theorem. Your first and third expressions are equivalent. The middle one is not (unless P(S|~D) = 1, which would be weird).

  7. The following data is about a poll that occurred in 3 states. In state1, 50% of voters support Party1, in state2, 60% of the voters support Party1, and in state3, 35% of the voters support Party1. Of the total population of the three states, 40% live in state1, 25% live in state2, and 35% live in state3. Given that a voter supports Party1, what is the probability that he lives in state2?

    1. Fun problem, thanks! I posted a solution here:

    2. Simplest way to answer this question is to assume a number for total population.
      Assume that 400 is the combined population of all three states. Then each of state1,state2,state3 will be with population 160,100,140 respectively. From the data provided we can derive that number of people supporting party1 in state1,state2,state3 are 0.5*160,0.6*10,0.35*140 (= 80,60,49).
      From the above derived information we can evaluate the probability that a party1 supporter will be from state2 is 60/(80+60+49) = 0.317 Approx.

    3. Yes, this "natural frequency" way of solving problems like this is excellent.