Saturday, April 12, 2014

Think X, Y and Z: What's in the pipeline?

Greetings from PyCon 2014 in Montreal!  I did a book signing yesterday at the O'Reilly Media booth.  I had the pleasure of working side by side with David Beazley, who was signing copies of The Python Cookbook, now updated for Python 3 and, I assume, including all of the perverse things Prof. Beazley does with Python generators.

Normally O'Reilly provides 25 copies of the book for signing, but due to a mix up, they brought extra copies of all of my books, so I ended up signing about 50 books.  That was fun, but tiring.

Several people asked me what I am working on now, so I thought I would answer here:

1) This semester I am revising Think Stats and using it to teach Data Science at Olin College.  Think Stats was the first book in the Think X series to get published.  There are still many parts I am proud of, but a few parts that make me cringe.  So I am fixing some errors and adding new sections on multiple linear regression, logistic regression, and survival analysis.  The revised edition should be done in June 2014.  I will post my draft, real soon, at

2) I am also teaching Software Systems this semester, a class that covers system-level programming in C along with topics in operating systems, networks, databases, and embedded systems.  We are using Head First C to learn the language, and I am writing Think OS to present the OS topics.  The current (very rough) draft is at  I am planning to incorporate some material from The Little Book of Semaphores.  Then I will finish it up this summer.

3) Next semester I am teaching two half-semester classes: Bayesian statistics during the first half, based on Think Bayes; and digital signal processing during the second half.  I am taking a computational approach to DSP (surprise!), with emphasis on applications like sound and image processing.  The goal is for students to understand spectral analysis using FFT and to wrap their heads around time-domain and frequency-domain representations of time series data.  Working with sound takes advantage of human hearing, which is a pretty good spectral analysis system.  Applications will include pitch tracking as in Rock Band, and maybe we will reimplement Auto-Tune.

I have drafted a few chapters and posted them at I'll be working on it over the summer, then the students will work with it in the fall. If it all goes well, I'll revise it in January 2015, or maybe that summer.

One open question is what environment I will develop the book in. My usual work environment is LaTeX for the book, with Makefiles that generate PDF, HTML, and DocBook. I write code in emacs and run it on the Linux command line. So, that's a pretty old school toolkit.

For a long time I've been thinking about switching to IPython, which would allow me to combine text and code examples, and give users the ability to run the code while reading.

But two things have kept me from making the leap:

a) I like editing scripts, and I hate notebooks.  I find the interface awkward, and I don't want to keep track of which cells have to execute when I make a change.

b) I have been paralyzed by the number of options for distributing the results. For each of my books I have a Python module that contains my library code, so I need to distribute the module along with the notebook. Notebook viewers like nbviewer would not work for me (I think). At the other extreme, I could package my development in a Virtual Machine and let readers run my VM. But that seems like overkill, and many of my readers and not comfortable with virtual machines.

But I just listened to Fernando Perez's keynote at PyCon, where he talked about his work on IPython. I grabbed him after the talk and he was very helpful; in particular, he suggested the option of using Wakari Cloud. Continuum Analytics, which runs Wakari, is a PyCon sponsor, so I will try to talk to them later today (and find out whether I am misunderstanding what they do).

4) The other class I am teaching in the fall is Modeling and Simulation, which I helped develop, along with my colleagues John Geddes and Mark Somerville, about six years ago. For the last few years we've been using my book, Physical Modeling in MATLAB.

The book presents the computational part of the class, but doesn't include all of the other material, especially the modeling part of Modeling and Simulation! Prof. Somerville has drafted additional chapters to present that material, and Prof. Geddes and I have talked about expanding some of the math topics.

This summer the three of us are hoping to draft a major revision of the book. We'll try it out in the fall, and then finish it off in January.

The current version of the book uses MATLAB, but the features we use are also available in Python with SciPy. So at some point we might switch the class and the book over to Python.


That's what's in the pipeline! I'm not sure I will actually be able to finish four books in the next year, so this schedule might be "aspirational" (that's Olin-speak for things that might not be completely realistic).

Let me know what you think. Which ones should I work on first?


  1. Hello, my vote is for Modeling and Simulation using SciPy. Best Regards,

  2. Hi,
    You can use one of those online services for your LaTeX writing, this will also assist you with feedback:

    This is one, there are many like it.

    One thing, could you add release notes for every update of the books in the webpage dedicated to each book?

    1. Thanks for the suggestion. And yes, I should do release notes. I'll try!

  3. I would love to read about the parts that make you cringe in your thinking stats book. It is incredibly beneficial to see such reflections.

    I had not looked at IPython before and was going to recommend literate-programming, but it seems that is what IPython is. I think it is a great model to create something with.

    IPython does seem a bit on the interactive side for me. If you like the idea, but want something a bit more like just typing up a file and then having transformation magic happen, you can see if my version of literate programming works out for you: (new version in the works, but I have been using the current one for a year, quite happily). It uses named sections to include snippets and allows arbitrary transformations of the code blocks. I imagine for a project like yours, you could have code blocks that get included in the final document as well as code blocks that give full working versions as well as tests, etc.

    1. I will look into literate programming. Thanks. And maybe I'll write a post about the regrettable parts of Think Stats.

    2. Along the same lines, have you looked at org-mode in Emacs, which allows you to write your latex and include code in any language (and basically organise your life in plain text, to borrow the title of a seminal article on the topic)

    3. Hi Ben. I will look into it. Although at this point I have so much experience with my current tool chain that the cost of exploring new tools might be prohibitive, even if they would be better!

  4. This comment has been removed by the author.

  5. Hello Allen,

    First of all i would like to say thanks for all the things you do for the whole community with your books.

    About your next books i think that the Think DSP and the Think X (Modeling and Simulation) would be awesome.

    About the tools that you use i understand what you say about the prohibitive that can be to go to new tools when you have gone so far with you current ones. but i encourage you to try wakari they have a good feature for sharing a bundle which may include your custom libraries preloaded also solutions and working examples to try.

    Theres some books out there made and shared in that way for example: Check this| Bayesian Methods for Hackers: Chapter 4

    And similar things can be found in Wakari Gallery

    1. Hi Flemming. Thanks for these suggestions. I checked out Wakari while I was at PyCon, and got a chance to talk about it with Travis Oliphant himself. It definitely sounds like a good option, especially for Think DSP.

  6. Hey Allen,

    Just a quick note because it caught my eye that you are considering moving ModSim over to Python. While I much prefer Python over MATLAB and did not particularly enjoy the MATLAB portion of ModSim, looking back I think it was a valuable experience and MATLAB is a good tool to have under your belt throughout many of Olin's courses. I would advise against making that change.

    I also hope that you continue to keep the cat on the cover of the physical copies. :)


    1. Thanks! We're still kicking this decision around, so thank you for your thoughts.