Abstract: My two favorite topics in probability and statistics are Bayes’s theorem and logistic regression. Because there are similarities between them, I have always assumed that there is a connection. In this note, I demonstrate the connection mathematically, and (I hope) shed light on the motivation for logistic regression and the interpretation of the results.
1 Bayes’s theoremI’ll start by reviewing Bayes’s theorem, using an example that came up when I was in grad school. I signed up for a class on Theory of Computation. On the first day of class, I was the first to arrive. A few minutes later, another student arrived. Because I was expecting most students in an advanced computer science class to be male, I was mildly surprised that the other student was female. Another female student arrived a few minutes later, which was sufficiently surprising that I started to think I was in the wrong room. When another female student arrived, I was confident I was in the wrong place (and it turned out I was).
As each student arrived, I used the observed data to update my belief that I was in the right place. We can use Bayes’s theorem to quantify the calculation I was doing intuitively.
I’ll us H to represent the hypothesis that I was in the right room, and F to represent the observation that the first other student was female. Bayes’s theorem provides an algorithm for updating the probability of H:
|P(H|F) = P(H)|
- P(H) is the prior probability of H before the other student arrived.
- P(H|F) is the posterior probability of H, updated based on the observation F.
- P(F|H) is the likelihood of the data, F, assuming that the hypothesis is true.
- P(F) is the likelihood of the data, independent of H.
When I was in grad school most advanced computer science classes were 90% male, so if I was in the right room, the likelihood of the first female student was only 10%. And the likelihood of three female students was only 0.1%.
If we don’t assume I was in the right room, then the likelihood of the first female student was more like 50%, so the likelihood of all three was 12.5%.
Plugging those numbers into Bayes’s theorem yields P(H|F) = 0.64 after one female student, P(H|FF) = 0.26 after the second, and P(H|FFF) = 0.07 after the third.
[UPDATE: An earlier version of this article had incorrect values in the previous sentence. Thanks to David Burger for catching the error.]
2 Logistic regressionLogistic regression is based on the following functional form:
|logit(p) = β0 + β1 x1 + ... + βn xn|
|logit(p) = ln||⎛|
- Why is logit(p) the right choice for the dependent variable?
- Why should we expect the relationship between logit(p) and the explanatory variables to be linear?
- How should we interpret the estimated parameters?
On notation: I’ll use P(H) for the probability that some hypothesis, H, is true. O(H) is the odds of the same hypothesis, defined as
|LO(H) = lnO(H)|
3 Making the connectionTo demonstrate the connection between Bayes’s theorem and logistic regression, I’ll start with the odds form of Bayes’s theorem. Continuing the previous example, I could write
|O(H|F) = O(H) LR(F|H) (1)|
- O(H) is the prior odds that I was in the right room,
- O(H|F) is the posterior odds after seeing one female student,
- LR(F|H) is the likelihood ratio of the data, given the hypothesis.
Noticing that logistic regression is expressed in terms of log-odds, my next move is to write the log-odds form of Bayes’s theorem by taking the log of Eqn 1:
|LO(H|F) = LO(H) + LLR(F|H) (2)|
|LO(H|M) = LO(H) + LLR(M|H) (3)|
|LO(H|X) = LO(H) + LLR(X|H) (4)|
|LLR(X|H) = LLR(F|H) + X [LLR(M|H) − LLR(F|H)] (6)|
4 Odds ratiosThe next move is to recognize that the part of Eqn 4 in brackets is the log-odds ratio of H. To see that, we need to look more closely at odds ratios.
Odds ratios are often used in medicine to describe the association between a disease and a risk factor. In the example scenario, we can use an odds ratio to express the odds of the hypothesis H if we observe a male student, relative to the odds if we observe a female student:
Applying Bayes’s theorem to the top and bottom of the previous expression yields
|LORX(H) = LLR(M|H) − LLR(F|H) (7)|
5 ConclusionNow we have all the pieces we need; we just have to assemble them. Combining Eqns 4 and 5 yields
|LLR(H|X) = LLR(F) + X LOR(X|H) (8)|
|LO(H|X) = LO(H) + LLR(F|H) + X LOR(X|H) (9)|
|LO(H|X) = LO(H|F) + X LOR(X|H)|
|logit(p) = β0 + X β1|
- The predicted value, logit(p), is the posterior log odds of the hypothesis, given the observed data.
- The intercept, β0, is the log-odds of the hypothesis if X=0.
- The coefficient of X, β1, is a log-odds ratio that represents odds of H when X=1, relative to when X=0.
This document was translated from LATEX by HEVEA.