Sunday, September 14, 2014

Regression with Python, pandas and StatsModels

I was at Boston Data-Con 2014 this morning, which was a great event.  The organizer, John Verostek, seems to have created this three-day event single-handedly, so I am hugely impressed.

Imran Malek started the day with a very nice iPython tutorial.  The description is here, and his slides are here.  He grabbed passenger data from the MBTA and generated heat maps showing the number of passengers at each stop in the system during each hour.  The tutorial covered a good range of features, and it seemed like many of the participants were able to download the data and follow along in iPython.

And Imran very kindly let me use his laptop to project slides for my talk, which was next.  The description of my talk is here:
Regression is a powerful tool for fitting data and making predictions. In this talk I present the basics of linear regression and logistic regression and show how to use them in Python. I demonstrate pandas, a Python module that provides structures for data analysis, and StatsModels, a module that provides tools for regression and other statistical analysis. 
As an example, I will use data from the National Survey of Family Growth to generate predictions for the date of birth, weight, and sex of an expected baby. This presentation is based on material from the recent revision of Think Stats, an introduction to data analysis and statistics with Python.
This talk is appropriate for people with no prior experience with regression. Basic familiarity with Python is recommended but not required.
 And here are my slides:

The material for this talk is from the second edition of Think Stats, which is in production now and scheduled for release in early November.  My draft is available here, and you can pre-order a paper copy here.

As I expected, I prepared way more material than I could present.  The audience had some excellent questions, so we spent more time on linear regression and did not get to logistic regression.

The nice people at O'Reilly Media sent over 25 copies of my book, Think Python, so we had a book signing after the talk.  I had a chance to chat with everyone who got a book, which is always a lot of fun.

I believe video of the talk will be available soon.  I will post it here when it is.

Many thanks to John Verostek for organizing this conference, and to the sponsors for breakfast and lunch.  Looking forward to next year!


  1. Thx for this article with the links. Is Imran Malek's presentation that short? Only 10 slides?

    1. I think he only had a few slides, but he has links to the data and the code. Most of his presentation was a live tutorial.