## Tuesday, January 4, 2011

### Proofiness and elections

I am enjoying Charles Seife’s Proofiness, but have to point out what I think is a flaw.  A major part of the book addresses errors in elections and election polls, and Seife presents a detailed analysis of two contested elections: Franken vs. Coleman in Minnesota and Gore vs. Bush in Florida.

In both cases the margin of victory was (much) smaller than the margin of error, and Seife concludes that the result should be considered a tie.  He points out:
“...both states, by coincidence, happen to break a tie in exactly the same way.  In the case of a tie vote, the winner shall be determined by lot.  In other words, flip a coin.  It’s hard to swallow, but the 2008 Minnesota Senate race and, even more startling, the 2000 presidential election should have been settled with the flip of a coin.”
So far, I agree, but Seife implies that this solution would avoid the legal manipulations that follow close elections (which he describes in entertaining detail, including a contested ballot in Minnesota full of write-in votes for “Lizard People”).

But Seife doesn’t solve the problem; he only moves it.  Instead of two outcomes (A wins and B wins) there are three (A wins, statistical tie, B wins), but the lines between outcomes are just as sharp, so any election that approaches them will be just as sharply contested.

As always, the correct solution is easy if you just apply Bayesian statistics.  In order to compute the margin of error, we need a model of the error process (for example, there is a chance that any vote will be lost, and a chance that any vote will be double counted).  With a model like that, we can use the observed vote count to compute the probability that either candidate received more votes.  Then we toss a biased coin, with the computed probability, to determine the winner.

For example, the final count in the 2008 Minnesota Senate race was 1,212,629 votes for Franken and 1,212,317 votes for Coleman.  But the actual number of votes cast for each candidate could easily be 1000 more or less than the count, so it is possible that Coleman actually received more votes; if we do the math, we might find that the probability is 30%.  In that case, I suggest, we should choose a random number between 1 and 100, and if it is less than 30, Coleman wins.

This system has a few nice properties: it solves the problem Seife calls “disestimation,” which is the tendency to pretend that measurements are more precise than they are.  And it is fair in the sense that the probability of winning equals the probability of receiving more votes.

There are drawbacks.  One is that it, in order to model the error process, we have to acknowledge that there are errors, which is unpalatable.  The other is that it is possible for a candidate with a higher vote count to lose.  That outcome would be strange, but certainly no stranger than the result of the 2000 presidential election.
-----
If you find this sort of thing interesting, you might like my free statistics textbook, Think Stats. You can download it or read it at thinkstats.com.

1. Do you suggest always flipping a biased coin? If yes, there's the unsettling possibility of getting a clearly disfavored candidate. Otherwise the boundary between "decided" and "contested" elections is still just as arbitrary as with Seife's coin flip. I'm not sure there's any authoritative way to set a margin of error, except maybe to commit to one ahead of time based on past experience.

2. Yes, I am suggesting that the outcome should always be determined by a biased coin toss, but to be clear, the bias would be based on posterior probabilities, not the fraction of ballots. With the very large sample sizes in most elections, the chance of a clearly disfavored candidate winning would be astronomically small.

The bias would be effectively 0 (or 1) for any outcome other than a statistical tie. But the advantage of what I am proposing is a smooth transition from A wins to tie to B wins, without any thresholds that would be a point of contention.

3. I assume that the distribution of Franken's votes follows Poisson distribution with lamda = 1,212,629, and Coleman's distribution is similar. Then I sample from the two Poisson distributions for 1000 times and compare the votes. The result is about 41% for Coleman to win. Is there something wrong with my assumptions? Why my result differs from yours? Thank you.

1. I think your 41% differs from my 30% because you actually computed your answer, while I just made mine up as an example. I'm actually mildly surprised that I was as close as I was.

2. 