Wednesday, December 17, 2014

Statistics tutorials at PyCon 2015

I am happy to announce that I will offer two statistics tutorials at PyCon 2015 on April 9 in Montreal.  In the morning session I am teaching Bayesian Statistics Made Simple, which I have taught several times before, including the last three PyCons.  In the afternoon I am offering a new tutorial, Statistical Inference with Computational Methods.

The whole tutorial schedule is here, along with registration information.  And here are some details about the tutorials:

Bayesian statistics made simple

Audience level:


An introduction to Bayesian statistics using Python.  Bayesian statistics are usually presented mathematically, but many of the ideas are easier to understand computationally.  People who know Python can get started quickly and use Bayesian analysis to solve real problems.  This tutorial is based on material and case studies from Think Bayes (O’Reilly Media).


Bayesian statistical methods are becoming more common and more important, but there are not many resources to help beginners get started.  People who know Python can use their programming skills to get a head start.
I will present simple programs that demonstrate the concepts of Bayesian statistics, and apply them to a range of example problems.  Participants will work hands-on with example code and practice on example problems.
Attendees should have at least basic level Python and basic statistics.  If you learned about Bayes’s theorem and probability distributions at some time, that’s enough, even if you don’t remember it!
Attendees should bring a laptop with Python and matplotlib.  You can work in any environment; you just need to be able to download a Python program and run it.  I will provide code to help attendees get set up ahead of time.

Statistical inference with computational methods

Audience level:


Statistical inference is a fundamental tool in science and engineering, but it is often poorly understood.  This tutorial uses computational methods, including Monte Carlo simulation and resampling, to explore estimation, hypothesis testing and statistical modeling.  Attendees will develop understanding of statistical concepts and learn to use real data to answer relevant questions.


Do you know the difference between standard deviation and standard error?  Do you know what statistical test to use for any occasion?  Do you really know what a p-value is?  How about a confidence interval?
Most students don’t really understand these concepts, even after taking several statistics classes.  The problem is that these classes focus on mathematical methods that bury the concepts under a mountain of details.
This tutorial uses Python to implement simple statistical experiments that develop deep understanding.  Attendees will learn about resampling and related tools that use random simulation to perform statistical inference, including estimation and hypothesis testing.  We will use pandas, which provides structures for data analysis, along with NumPy and SciPy.
I will present examples using real-world data to answer relevant questions.  The tutorial material is based on my book, Think Stats, a class I teach at Olin College, and my blog, “Probably Overthinking It.

More information and registration here.

Thursday, December 4, 2014

The Rock Hyrax Problem

This is the third of a series of articles about Bayesian analysis.  The previous article is here.

Earlier this semester I posed this problem to my Bayesian statistics class at Olin College:
Suppose I capture and tag 10 rock hyraxes.  Some time later, I capture another 10 hyraxes and find that two of them are already tagged.  How many hyraxes are there in this environment?
This is an example of a mark and recapture experiment, which you can read about on Wikipedia.  The Wikipedia page also includes the photo of a tagged hyrax shown above.

As always with problems like this, we have to make some modeling assumptions.

1) For simplicity, you can assume that the environment is reasonably isolated, so the number of hyraxes does not change between observations.

2) And you can assume that each hyrax is equally likely to be captured during each phase of the experiment, regardless of whether it has been tagged.  In reality, it is possible that tagged animals would avoid traps in the future, or possible that the same behavior that got them caught the first time makes them more likely to be caught again.  But let's start simple.

My solution to this problem uses the computation framework from my book, Think Bayes.  The framework is described in this notebook.  If you have read Think Bayes or attended one of my workshops, you might want to attempt this problem before you look at my solution.

If you solve this problem analytically, or use MCMC, and you want to share your solution, please let me know and I will post it here.

And when you are ready, you can see my solution in this notebook.

I will post more of the exercises from my class over the next few weeks.

UPDATE December 5, 2014: João Neto posted a solution to this problem in BUGS using a Jeffrey's prior.