- In the first article, I presented the Red Dice problem, which is a warm-up problem that might help us make sense of the other problems.
- In the second article, I presented the problem of interpreting medical tests when there is uncertainty about parameters like sensitivity and specificity. I explained that there are (at least) four models of the scenario, and I hinted that some of them yield different answers. I presented my solution for one of the scenarios, using the computational framework from Think Bayes.
- In the third article, I presented my solution for Scenario B.
Scenario A: The patients are drawn at random from the relevant population, and the reason we are uncertain about t is that either (1) there are two versions of the test, with different false positive rates, and we don't know which test was used, or (2) there are two groups of people, the false positive rate is different for different groups, and we don't know which group the patient is in.
Scenario B: As in Scenario A, the patients are drawn at random from the relevant population, but the reason we are uncertain about t is that previous studies of the test have been contradictory. That is, there is only one version of the test, and we have reason to believe that t is the same for all groups, but we are not sure what the correct value of t is.
Scenario C: As in Scenario A, there are two versions of the test or two groups of people. But now the patients are being filtered so we only see the patients who tested positive and we don't know how many patients tested negative. For example, suppose you are a specialist and patients are only referred to you after they test positive.
Scenario D: As in Scenario B, we have reason to think that t is the same for all patients, and as in Scenario C, we only see patients who test positive and don't know how many tested negative.
And the questions we would like to answer are:
- What is the probability that a patient who tests positive is actually sick?
- Given two patients who have tested positive, what is the probability that both are sick?
You can read a static version of the notebook here.
You can run the notebook on Binder.
If you click the Binder link, you should get a home page showing the notebooks in the repository for this blog. Click on test_scenario_cd.ipynb to load the notebook for this article.
As I said in the first article, this problem turned out to be harder than I expected. First, it took time, and a lot of untangling, to identify the different scenarios. Once we did that, figuring out the answers was easier, but still not easy.
For me, writing a simulation for each scenario turned out to be very helpful. It has the obvious benefit of providing an approximate answer, but it also forced me to specify the scenarios precisely, which is crucial for problems like these.
Now, having solved four versions of this problem, what guidance can we provide for interpreting medical tests in practice? And how realistic are these scenarios, anyway?
Most medical tests are more sensitive than 90% and yield false positives less often than 20-40% of the time. But many symptoms that indicate disease are neither sensitive nor specific. So this analysis might be more applicable to interpreting symptoms, rather than test results.
I think all four scenarios are relevant to practice:
- For some signs of disease, we have estimates of true and false positive rates, but when there are multiple inconsistent estimates, we might be unsure which estimate is right. For some other tests, we might have reason to believe that these parameters are different for different groups of patients; for example, many antibody tests are more likely to generate false positives in patients who have been exposed to other diseases.
- In some medical environments, we could reasonably treat patients as a random sample of the population, but more often patients have been filtered by a selection process that depends on their symptoms and test results. We considered two extremes: in Scenarios A and B, we have a random sample, and in C and D we only see patients who test positive. But in practice the selection process is probably somewhere in between.
If the selection process is unknown, and we don't know whether the parameters of the test are likely to be different for different groups, one option is to run the analysis for all four scenarios (or rather, for the three scenarios that are different), and use the range of the results to characterize uncertainty due to model selection.
[May 11, 2016] In this comment on Reddit, /u/ppcsf shows how to solve this problem using a probabilistic programming language based on C#. The details are in this gist.