Thursday, May 8, 2014

Implementing PMFs in Python

Last year I gave a keynote talk at PyCon Taiwan called "Python Epistemology," and I wrote this blog article about it.  The video is here, but unfortunately the sound quality is poor.  In the talk, I demonstrate the use of a Counter, one of the data structures in Python's collections module;  specifically, I use a Counter to implement a probability mass function (PMF) and a suite of Bayesian hypotheses.

This year I was at PyCon 2014 in Montreal and three things happened that lead to this post:
  1. I talked with Fernando Peréz, who gave an excellent keynote talk about doing open science with iPython.  He convinced me to give iPython notebooks another chance,
  2. I talked with Travis Oliphant, co-founder of Continuum Analytics, who convinced me to try Wakari for hosting iPython notebooks, and
  3. I shared a taxi to the airport with Raymond Hettinger, who received an award at Pycon this year for his contributions to core Python modules including collections.
Raymond told me that he heard about my talk from David Beazley, who was in the audience in Taiwan, and asked if I would send him my code to use as an example of what you can do with Counters.  I agreed, of course, but it has taken several weeks to get it done.

Since then, I created this Github repository, which contains the code examples from my talk.  I also put the code into an iPython notebook, which I posted on nbviewer.  I found nbviewer incredibly easy to use; I pasted in the URL of my Github repo, and it generated this static view of the notebook.

Wakari is similar, but it generates a dynamic view of the notebook where anyone can execute and modify the code.  I set up an account, uploaded the notebook, and shared this dynamic notebook, all in less than 10 minutes.  I was very impressed.

If you are interested, please read either the static or dynamic version, then come back here if you have comments.

I want to thank everyone at PyCon for an excellent conference, and especially Fernando Peréz, Travis Oliphant, Raymond Hettinger, and David Beazley for taking the time to talk with me about this example.

Finally, here's an excerpt from the notebook: