Probably Overthinking It: Bayesian Statistics for Undergrads

Yesterday Sanjoy Mahajan and I led a workshop on teaching Bayesian statistics for undergraduates. The participants were college teachers from around New England, including Norwich University in Vermont and Wesleyan University in Connecticut, as well as our neighbors, Babson College and Wellelsey College.

The feedback we got was enthusiastic, and we hope the workshop will help the participants design new classes that make Bayesian methods accessible to their students.

Materials from the workshop are in this GitHub repository. And here are the slides:

The goal of the workshop is to show that teaching Bayesian statistics to undergrads is possible and desirable. To show that it's possible, we presented three approaches:

A computational approach, based on my class at Olin, Computational Bayesian Statistics, and the accompanying book, Think Bayes. This material is appropriate for students with basic programming skills, although a lot of it could adapted for use with spreadsheets.
An analytic approach, based on Sanjoy's class, called Bayesian Inference. This material is appropriate for students who are comfortable with mathematics including calculus.
We also presented core material that does not depend on programming or advanced math --really just arithmetic.

Why Bayes?

Reasons the participants gave for teaching Bayes included:

Some of them work and teach in areas like psychology and biology where the limitations of classical methods have become painfully apparent, and interest in alternatives is high.
Others are interested in applications like business intelligence and data analytics where Bayesian methods are a hot topic.
Some participants teach introductory classes that satisfy requirements in quantitative reasoning, and they are looking for material to develop students' ability to reason with and about uncertainty.

I think these are all good reasons. At the introductory level, Bayesian methods are a great opportunity for students who might not be comfortable with math to gradually build confidence with mathematical methods as tools for better thinking.

Bayes's theorem provides a divide-and-conquer strategy for solving difficult problems by breaking them into smaller, simpler pieces. And many of the classic applications of Bayes's theorem -- like interpreting medical tests and weighing courtroom evidence -- are real-world problems where careful thinking matters and mistakes have consequences!

For students who only take a few classes in mathematics, I think Bayesian statistics is a better choice than calculus, which the vast majority of students will never use again; and better than classical statistics, which (based on my observation) often leaves students more confused about quantitative reasoning than when they started.

At the more advanced level, Bayesian methods are appealing because they can be applied in a straightforward way to real-world decision making processes, unlike classical methods, which generally fail to answer the questions we actually want to answer.

For example, if we are considering several hypotheses about the world, it is useful to know the probability that each is true. You can use that information to guide decision making under uncertainty. But classical statistical inference refuses to answer that question, and under the frequentist interpretation of probability, you are not even allowed to ask it.

As another example, the result you get from Bayesian statistics is generally a posterior distribution for a parameter, or a joint distribution for several parameters. From these results, it is straightforward to compute a distribution that predicts almost any quantity of interest, and this distribution encodes not only the most likely outcome or central tendency; it also represents the uncertainty of the prediction and the spread of the possible outcomes.

Given a predictive distribution, you can answer whatever questions are relevant to the domain, like the probability of exceeding some bound, or the range of values most likely to contain the true value (another question classical inference refuses to answer). And it is straightforward to feed the entire distribution into other analyses, like risk-benefit analysis and other kinds of optimization, that directly guide decision making.

I mention these advantages in part to address one of the questions that came up in the workshop. Several of the participants are currently teaching traditional introductory statistics classes, and they would like to introduce Bayesian methods, but are also required to cover certain topics in classical statistics, notably null-hypothesis significance testing (NHST).

So they want to know how to design a class that covers these topics and also introduces Bayesian statistics. This is an important challenge, and I was frustrated that I didn't have a better answer to offer at the workshop. But with some time to organize my thoughts, I have a two suggestions:

Avoid direct competition

I don't recommend teaching a class that explicitly compares classical and Bayesian statistics. Pedagogically, it is likely to be confusing. Strategically, it is asking for intra-departmental warfare. And importantly, I think it misrepresents Bayesian methods, and undersells them, if you present them as a tool-for-tool replacement for classical methods.

The real problem with classical inference is not that it gets the wrong answer; the problem is that is asks the wrong questions. For example, a fundamental problem with NHST is that it requires a binary decision: either we reject the null hypothesis or we fail to reject it (whatever that means). An advantage of the Bayesian approach is that it helps us represent and work with uncertainty; expressing results in terms of probability is more realistic, and more useful, than trying to cram the world into one of two holes.

If you use Bayesian methods to compute the probability of a hypothesis, and then apply a threshold to decide whether the theory is true, you are missing the point. Similarly, if you compute a posterior distribution, and then collapse it to a single point estimate (or even an interval), you are throwing away exactly the information that makes Bayesian results more useful.

Bayesian methods don't do the same things better; they do different things, which are better. If you want to demonstrate the advantages of Bayesian methods, do it by solving practical problems and answering the questions that matter.

As an example, this morning my colleague Jon Adler sent me a link to this paper, Bayesian Benefits for the Pragmatic Researcher, which is a model of what I am talking about.

Identify the goals

As always, it is important to be explicit about the learning goals of the class you are designing. Curriculum problems that seems impossible can sometimes be simplified by unpacking assumptions about what needs to be taught and why. For example, if we think about why NHST is a required topic, we get some insight into how to present it: if you want to make sure students can read papers that report p-values, you might take one approach; if you imagine they will need to use classical methods, that might require a different approach.

For classical statistical inference, I recommend "The New Statistics", an approach advocated by Geoff Cumming (I am not sure to what degree it is original to him). The fundamental idea of is that statistical analysis should focus on estimating effect sizes, and should express results in terms that emphasize practical consequences, as contrasted with statistical significance.

If "The New Statistics" is what we should teach, computational simulation is how. Many of the ideas that take the most time, and seem the hardest, in a traditional stats class, can be taught much more effectively using simulation. I wrote more about this just last week, in this post, There is Still Only One Test, and there are links there to additional resources.

But if the goal is to teach classical statistical inference better, I would leave Bayes out of it. Even if it's tempting to use a Bayesian framework to explain the problems with classical inference, it would be more likely to confuse students than help them.

If you only have space in the curriculum to teach one paradigm, and you are not required to teach classical methods, I recommend a purely Bayesian course. But if you have to teach classical methods in the same course, I suggest keeping them separated.

I experienced a version of this at PyCon this year, where I taught two tutorials back to back: Bayesian statistics in the morning and computational statistical inference in the afternoon. I joked that I spent the morning explaining why the afternoon was wrong. But the reality is that they two topics hardly overlap at all. In the morning I used Bayesian methods to formulate real-world problems and answer practical questions. In the afternoon, I helped people understand classical inference, including its limitations, and taught them how to do it well, if they have to.

I think a similar balance (or compromise?) could work in the undergraduate statistic curriculum at many colleges and universities.

2 comments:

JessicaJune 15, 2016 at 5:32 AM
Thank you, Allen (and Sanjoy)! Your workshop gave me a lot to think about.
Jerzy WieczorekJune 18, 2016 at 8:46 AM
"Bayesian methods don't do the same things better; they do different things, which are better."
...
They do different things, indeed. So why insist that one is universally better?

One thing I've been mulling over lately: Classical methods were invented for, and by, people in the business of designing and running experiments. "Hypothesis tests" aren't really testing hypotheses about the sample you collected, or the population you sampled, but about *the design of the experiment* you ran. In the simplest case, the question isn't "Is theta 0 or not?" but rather "Is this sample big enough to tell if theta is positive or negative?"
This is not always "the wrong question," as you call it. Sometimes yes, but other times it's crucial.

Perhaps the main benefit of inventing hypothesis tests was so you could *imagine* doing them as you make power calculations to choose the sample size (and other design details).
Engineers find it useful to know their instrument's operating characteristics before they choose which lathe or radio or oscilloscope to buy/use. Same with doing classical power calculations to design a study for science, or A/B testing, or medical trials, or what have you.

I don't mean to say the Bayes framework can't do this too (I know nothing about Bayesian experimental design) but just that traditional classical methods have legitimate uses different from, not worse than, traditional Bayesian methods.

Tuesday, June 14, 2016

Bayesian Statistics for Undergrads

Why Bayes?

Avoid direct competition

Identify the goals

2 comments: