Probably Overthinking It: Probability is hard: part three

This is the third part of a series of posts about conditional probability and Bayesian statistics.

In the first article, I presented the Red Dice problem, which is a warm-up problem that might help us make sense of the other problems.
In the second article, I presented the problem of interpreting medical tests when there is uncertainty about parameters like sensitivity and specificity. I explained that there are (at least) four models of the scenario, and I hinted that some of them yield different answers. I presented my solution for one of the scenarios, using the computational framework from Think Bayes.

Now, on to Scenario B. For those of you coming in late, here are the Scenarios again:

Scenario A: The patients are drawn at random from the relevant population, and the reason we are uncertain about t is that either (1) there are two versions of the test, with different false positive rates, and we don't know which test was used, or (2) there are two groups of people, the false positive rate is different for different groups, and we don't know which group the patient is in.

Scenario B: As in Scenario A, the patients are drawn at random from the relevant population, but the reason we are uncertain about t is that previous studies of the test have been contradictory. That is, there is only one version of the test, and we have reason to believe that t is the same for all groups, but we are not sure what the correct value of t is.

Scenario C: As in Scenario A, there are two versions of the test or two groups of people. But now the patients are being filtered so we only see the patients who tested positive and we don't know how many patients tested negative. For example, suppose you are a specialist and patients are only referred to you after they test positive.

Scenario D: As in Scenario B, we have reason to think that t is the same for all patients, and as in Scenario C, we only see patients who test positive and don't know how many tested negative.

And the questions we would like to answer are:

What is the probability that a patient who tests positive is actually sick?
Given two patients who have tested positive, what is the probability that both are sick?

I have posted a new Jupyter notebook with a Solution for Scenario B. It also includes the previous solution for Scenario A; if you read it before, you can skip to the new stuff.

You can read a static version of the notebook here.

OR

You can run the notebook on Binder.

If you click the Binder link, you should get a home page showing the notebooks and Python modules in the repository for this blog. Click on test_scenario_b.ipynb to load the notebook for this article.

I'll give you a chance to think about Scenarios C and D, and I'll post my solution in the next couple of days.

Enjoy!

Probably Overthinking It

Monday, May 9, 2016

Probability is hard: part three

No comments:

Post a Comment