Probably Overthinking It: Problematic presentation of probabilistic predictions

Theoretically, people should not be surprised by the results of the 2016 election; several credible forecasters predicted that it would be a close race. But a lot of people were surprised anyway. In my previous article I explained one reason: many people misinterpret probabilistic predictions.

But it is not entirely our fault. In many cases, the predictions were presented in forms that contributed to the confusion. In this article, I review the forecasts and suggest ways we can present them better.

The two forecasters I followed in the weeks prior to the election were FiveThirtyEight and the New York Times's Upshot. If you visited the FiveThirtyEight forecast page, you saw something like this:

And if you visited the Upshot page, you saw

FiveThirtyEight and Upshot use similar methods, so they generate the same kind of results: probabilistic predictions expressed in terms of probabilities.

The problem with probability

The problem with predictions in this form is that people do not have a good sense for what probabilities mean. In my previous article I explained part of the issue -- probabilities do not behave like other kinds of measurements -- but there might be a more basic problem: our brains naturally represent uncertainty in the form of frequencies, not probabilities.

If this frequency format hypothesis is true, it suggests that people understand predictions like "Trump's chances are 1 in 4" better than "Trump's chances are 25%".

As an example, suppose one forecaster predicts that Trump has a 14% chance and another says he has a 25% chance. At first glance, those predictions seems consistent with each other: they both think Clinton is likely to win. But in terms of frequencies, one of those forecasts means 1 chance in 7; the other means 1 chance in 4. When we put it that way, it is clearer that there is a substantial difference.

The Upshot tried to help people interpret probabilities by expressing predictions in the form of American football field goal attempts:

Obviously this analogy doesn't help people who don't watch football, but even for die-hard fans, it doesn't help much. In my viewing experience, people are just as surprised when a kicker misses -- even though they shouldn't be -- as they were by the election results.

One of the best predictions I saw came in the form of frequencies, sort of. On November 8, Nate Silver tweeted this:

"Basically", he wrote, "these 3 cases are equally likely". The implication is that Trump had about 1 chance in 3. That's a little higher than the actual forecast, but I think it communicates the meaning of the prediction more effectively than a probability like 28.6%.

The problem with pink

I think Silver's tweet would be even better if it dropped the convention of showing swing states in pink (for "leaning Republican") and light blue (for "leaning Democratic"). One of the reasons people are unhappy with the predictions is that the predictive maps look like this one from FiveThirtyEight:

And the results look like this (from RealClear Politics):

The results don't look like the predictions because in the results all the states are either dark red or dark blue. There is no pink in the electoral college.

Now suppose you saw a prediction like this (which I generated here):

These three outcomes are equally likely;

Trump wins one of them, Clinton wins two.

With a presentation like this, I think:

1) Before the election, people would understand the predictions better, because they are presented in terms of frequencies like "1 in 3".

2) After the election, people would evaluate the accuracy of the forecasts more fairly, because the results would look like one of the predicted possibilities.

The problem with asymmetry

Consider these two summaries of the predictions:

1) "FiveThirtyEight gives Clinton a 71% chance and The Upshot gives her an 85% chance."

2) "The Upshot gives Trump a 15% chance and FiveThirtyEight gives him a 29% chance."

These two forms are mathematically equivalent, but they are not interpreted the same way. The first form makes it sound like everyone agrees that Clinton is likely to win. The second form makes it clearer that Trump has a substantial chance of winning, and makes it more apparent that the two predictions are substantially different.

If the way we interpret probabilities is asymmetric, as it seems to be, we should be careful to present them both ways. But most forecasters and media reported the first form, in terms of Clinton's probabilities, more prominently, and sometimes exclusively.

The problem with histograms

One of the best ways to explain probabilistic predictions is to run simulations and report the results. The Upshot presented simulation results using a histogram, where the x-axis is the number of electoral college votes for Clinton (notice the asymmetry) and the y-axis is the fraction of simulations where Clinton gets that number:

This visualization has one nice property: the blue area is proportional to Clinton's chances and the red area is proportional to Trump's. To the degree that we assess those areas visually, this is probably more comprehensible than numerical probabilities.

But there are a few problems:

1) I am not sure how many people understand this figure in the first place.

2) Even for people who understand it, the jagginess of the results makes it hard to assess the areas.

3) This visualization tries to do too many things. Is it meant to show that some outcomes are more likely than others, as the title suggests, or is it meant to represent the probabilities of victory visually?

4) Finally, this representation fails to make the possible outcomes concrete. Even if I estimate the areas accurately, I still walk away expecting Clinton to win, most likely in a landslide.

The problem with proportions

One of the reasons people have a hard time with probabilistic predictions is that they are relatively new. The 2004 election was the first where we saw predictions in the form of probabilities, at least in the mass media. Prior to that, polling results were reported in terms of proportions, like "54% of likely voters say they will vote for Alice, 40% for Bob, with 6% undecided, and a margin of error of 5%".

Reports like this didn't provide probabilities explicitly, but over time, people developed a qualitative sense of likelihood. If a candidate was ahead by 10 points in the poll, they were very likely to win; if they were only ahead by 2, it could go either way.

The problem is that proportions and probabilities are reported in the same units: percentages. So even if we know that probabilities and proportions are not the same thing, our intuition can mislead us.

For example, if a candidate is ahead in the polls by 70% to 30%, there is almost no chance they will lose. But if the probabilities are 70% and 30%, that's a very close race. I suspect that when people saw that Clinton had a 70% probability, some of their intuition for proportions leaked in.

Solution: publish the simulations

I propose a simple solution that addresses all of these problems: forecasters should publish one simulated election each day, with state-by-state results.

As an example, here is a fake FiveThirtyEight page I mocked up:

If people saw predictions like this every day, they would experience the range of possible outcomes, including close finishes and landslides. By election day, nothing would surprise them.

Publishing simulations in this form solves the problems I identified:

The problem with probabilities: If you checked this page daily for a week, you would see Clinton win 5 times and Trump win twice. If the frequency format hypothesis is true, this would get the prediction into your head in a way you understand naturally.

The problem with pink: The predictions would look like the results, because in each simulation the states are red or blue; there would be no pink or light blue.

The problem with asymmetry: This way of presenting results doesn't break symmetry, especially if the winner of each simulation is shown on the left.

The problem with histograms: By showing only one simulation per day, we avoid the difficulty of summarizing large numbers of simulations.

The problem with proportions: If we avoid reporting probabilities of victory, we avoid confusing them with proportions of the vote.

Publishing simulations also addresses two problems I haven't discussed:

The problem with precision: Some forecasters presented predictions with three significant digits, which suggests an unrealistic level of precision. The intuitive sense of likelihood you get from watching simulations is not precise, but the imprecision in your head is an honest reflection of the imprecision in the models.

The problem with the popular vote: Most forecasters continue to predict the popular vote and some readers still follow it, despite the fact (underscored by this election) that the popular vote is irrelevant. If it is consistent with the electoral college, it's redundant; otherwise it's just a distraction.

In summary, election forecasters used a variety of visualizations to report their predictions, but they were prone to misinterpretation. Next time around, we can avoid many of the problems by publishing the results of one simulated election each day.

3 comments:

vonjdNovember 23, 2016 at 3:10 AM
I think the biggest problem was the bogus precision: "71.4%" - this implicates that the model could give a precision of one per thousand - which is of course bs.
StanNovember 29, 2016 at 3:46 AM
What do you think about the criticism which insists that their probability must be 1/2 until the election approaches, since the poll is so stochastic that we can never predict the future in long term.

Tuesday, November 22, 2016

Problematic presentation of probabilistic predictions