Tuesday, November 25, 2014

The World Cup Problem Part 2: Germany v. Argentina

This is the second of two articles about Bayesian analysis applied to World Cup soccer.  The previous article is here.

Earlier this semester I posed this problem to my Bayesian statistics class at Olin College:
In the final match of the 2014 FIFA World Cup, Germany defeated Argentina 1-0. How much evidence does this victory provide that Germany had the better team? What is the probability that Germany would win a rematch?
Before you can answer a question like this, you have to make some modeling decisions.  As I suggested to my class, scoring in games like soccer and hockey can be well modeled by a Poisson process, which assumes that each team, against a given opponent, will score goals at some goal-scoring rate, λ, and that this rate is stationary; in other words, the probability of scoring a goal is about the same at any point during the game.

If this model holds, we expect the distribution of time between goals to be exponential, and the distribution of goals per game to be Poisson.

My solution to this problem uses the computation framework from my book, Think Bayes.  The framework is described in this notebook.  If you have read Think Bayes or attended one of my workshops, you might want to attempt this problem before you look at my solution.

If you solve this problem analytically, or use MCMC, and you want to share your solution, please let me know and I will post it here.

And when you are ready, you can see my solution in this notebook.

I will post more of the exercises from my class over the next few weeks.

One technical note: Last week on reddit, someone asked about my use of "fake data" to construct the prior distribution.  This move is not as bogus as it sounds; it is just a convenient way to construct a prior that has the right mean (or whatever other statistics are known) and a shape that is consistent with background information.

UPDATE November 25, 2014:  Cameron Davidson-Pilon has once again risen to the challenge and implemented a solution using PyMC.  You can see his solution here.

If you want to learn more about PyMC, an excellent place to start is Cameron's online book Bayesian Methods for Hackers.

Tuesday, November 18, 2014

The World Cup Problem: Germany v. Brazil

Earlier this semester I posed this problem to my Bayesian statistics class at Olin College:
In the 2014 FIFA World Cup, Germany played Brazil in a semifinal match. Germany scored after 11 minutes and again at the 23 minute mark. At that point in the match, how many goals would you expect Germany to score after 90 minutes? What was the probability that they would score 5 more goals (as, in fact, they did)?
Before you can answer a question like this, you have to make some modeling decisions.  As I suggested to my class, scoring in games like soccer and hockey can be well modeled by a Poisson process, which assumes that each team, against a given opponent, will score goals at some goal-scoring rate, λ, and that this rate is stationary; in other words, the probability of scoring a goal is about the same at any point during the game.

If this model holds, we expect the distribution of time between goals to be exponential, and the distribution of goals per game to be Poisson.

My solution to this problem uses the computation framework from my book, Think Bayes.  The framework is described in this notebook.  If you have read Think Bayes or attended one of my workshops, you might want to attempt this problem before you look at my solution.

If you solve this problem analytically, or use MCMC, and you want to share your solution, please let me know and I will post it here.

And when you are ready, you can see my solution in this notebook.

I will post more of the exercises from my class over the next few weeks.  Coming next: The World Cup Problem Part II: Germany v. Argentina.


UPDATE November 19, 2014:  Cameron Davidson-Pilon kindly (and quickly!) responded to my request for a solution to this problem using PyMC.  You can see his solution here.  If you want to learn more about PyMC, an excellent place to start is Cameron's online book Bayesian Methods for Hackers.