Wednesday, September 19, 2018

Two hour marathon in 2031, maybe

On Sunday (September 16, 2018) Eliud Kipchoge ran the Berlin Marathon in 2:01:39, smashing the previous world record by more than a minute and taking a substantial step in the progression toward a two hour marathon.

In a previous article, I noted that the marathon record pace since 1970 has been progressing linearly over time, and I proposed a model that explains why we might expect it to continue.  Based on a linear extrapolation of the data so far, I predicted that someone would break the two hour barrier in 2041, plus or minus a few years.

Now it is time to update my predictions in light of the new record.  The following figure shows the progression of world record pace since 1970 (orange line), a linear fit to the data (blue line) and a 90% predictive confidence interval (shaded area).  The dashed lines show the two hour marathon pace (13.1 mph) and lower and upper bounds for the year we will reach it.


Since the previous record was broken in 2014, we have been slightly behind the long-term trend.  But the new record more than makes up for it, putting us at the upper edge of the predictive interval.

This model predicts that we might see a two hour marathon as early as 2031, and probably will before 2041.

Note that this model is based on data from races.  It is possible that we will see a two hour marathon sooner in under time trial conditions, as in the Nike Breaking2 project.

Thursday, September 13, 2018

Tom Bayes and the case of the double dice


The double dice problem

Suppose I have a box that contains one each of 4-sided, 6-sided, 8-sided, and 12-sided dice. I choose a die at random, and roll it twice without letting you see the die or the outcome. I report that I got the same outcome on both rolls.

1) What is the posterior probability that I rolled each of the dice?
2) If I roll the same die again, what is the probability that I get the same outcome a third time?

You can see the complete solution in this Jupyter notebook, or read the HTML version here.

Solution

Here's a BayesTable that represents the four hypothetical dice.
In [3]:
hypo = [Fraction(sides) for sides in [4, 6, 8, 12]]
table = BayesTable(hypo)
Out[3]:
hypo prior likelihood unnorm posterior
0 4 1 NaN NaN NaN
1 6 1 NaN NaN NaN
2 8 1 NaN NaN NaN
3 12 1 NaN NaN NaN

Since we didn't specify prior probabilities, the default value is equal priors for all hypotheses. They don't have to be normalized, because we have to normalize the posteriors anyway.
Now we can specify the likelihoods: if a die has n sides, the chance of getting the same outcome twice is 1/n.
So the likelihoods are:
In [4]:
table.likelihood = 1/table.hypo
table
Out[4]:
hypo prior likelihood unnorm posterior
0 4 1 1/4 NaN NaN
1 6 1 1/6 NaN NaN
2 8 1 1/8 NaN NaN
3 12 1 1/12 NaN NaN
Now we can use update to compute the posterior probabilities:
In [5]:
table.update()
table
Out[5]:
hypo prior likelihood unnorm posterior
0 4 1 1/4 1/4 2/5
1 6 1 1/6 1/6 4/15
2 8 1 1/8 1/8 1/5
3 12 1 1/12 1/12 2/15
In [6]:
table.posterior.astype(float)
Out[6]:
0    0.400000
1    0.266667
2    0.200000
3    0.133333
Name: posterior, dtype: float64
The 4-sided die is most likely because you are more likely to get doubles on a 4-sided die than on a 6-, 8-, or 12- sided die.

Part two

The second part of the problem asks for the (posterior predictive) probability of getting the same outcome a third time, if we roll the same die again.
If the die has n sides, the probability of getting the same value again is 1/n, which should look familiar.
To get the total probability of getting the same outcome, we have to add up the conditional probabilities:
P(n | data) * P(same outcome | n)
The first term is the posterior probability; the second term is 1/n.
In [7]:
total = 0
for _, row in table.iterrows():
    total += row.posterior / row.hypo
    
total
Out[7]:
Fraction(13, 72)
This calculation is similar to the first step of the update, so we can also compute it by
1) Creating a new table with the posteriors from table.
2) Adding the likelihood of getting the same outcome a third time.
3) Computing the normalizing constant.
In [8]:
table2 = table.reset()
table2.likelihood = 1/table.hypo
table2
Out[8]:
hypo prior likelihood unnorm posterior
0 4 2/5 1/4 NaN NaN
1 6 4/15 1/6 NaN NaN
2 8 1/5 1/8 NaN NaN
3 12 2/15 1/12 NaN NaN
In [9]:
table2.update()
Out[9]:
Fraction(13, 72)
In [10]:
table2
Out[10]:
hypo prior likelihood unnorm posterior
0 4 2/5 1/4 1/10 36/65
1 6 4/15 1/6 2/45 16/65
2 8 1/5 1/8 1/40 9/65
3 12 2/15 1/12 1/90 4/65
This result is the same as the posterior after seeing the same outcome three times.
This example demonstrates a general truth: to compute the predictive probability of an event, you can pretend you saw the event, do a Bayesian update, and record the normalizing constant.
(With one caveat: this only works if your priors are normalized.)




Tuesday, July 10, 2018

The Physics of Bungee Jumping

Bungee jumping turns out to be more complicated than I realized.  I use bungee jumping as an example in Modeling and Simulation in Python, which I am revising this summer.  The book is not done, but you can see the current draft here.

During the first phase of the jump, before the cord is fully extended, I treat the jumper as if they are in free fall, including the effect of gravity and air resistance, but ignoring the interaction between the jumper and the cord.

It turns out that this interaction is non-negligible.  As the cord drops from its folded initial condition to its extended final condition, it loses potential energy.  Where does that energy go?  It is transferred to the jumper!

The following diagram shows the scenario, courtesy of this web page on the topic:


The acceleration of the jumper turns out to be


where a is the net acceleration of the jumper, g is acceleration due to gravity, v is the velocity of the jumper, y is the position of the jumper relative to the starting point, L is the length of the cord, and μ is the mass ratio of the cord and jumper.

For a bungee jumper with mass 75 kg, I've computed the trajectory of a jumper with and without the effect of the cord.  The difference is more than two meters, which could be the difference between a successful jump and a bad day.

The details are in this Jupyter notebook.

Friday, June 22, 2018

Inference in three hours

I am preparing a talk for the Joint Statistical Meetings (JSM 2018) in August.  It's part of a session called "Bringing Intro Stats into a Multivariate and Data-Rich World"; my talk is called "Inference in Three Hours, and More Time for the Good Stuff".

Here's what I said I would talk about:
Teaching statistical inference using mathematical methods takes too much time, emphasizes the least important material, and leaves many students unprepared to apply statistics in the real world. Simple computer simulations can demonstrate the fundamental ideas of statistical inference quickly, clearly, and memorably. Computational methods are also robust and flexible, making it possible to work with a wider range of data and experiments. And by teaching statistical inference better and faster, we leave time for the most important goals of statistics education: preparing students to use data to answer questions and guide decision making under uncertainty. In this talk, I discuss problems with current approaches and present educational material I have developed based on computer simulations in Python.
I have slides for the talk now:



And here's the Jupyter notebook they are based on.

I have a few weeks until the conference, so comments and suggestions are welcome.

----

Coincidentally, I got question on Twitter today that's related to my talk:
Very late to this post by @AllenDowney, but quite informative: http://allendowney.blogspot.com/2015/11/recidivism-and-single-case-probabilities.html  …
Have one question though: seems a lot of the single case reasoning here is similar to what I was taught was a mistaken conclusion: “that there is a 95% prob that a parameter lies within the given 95% CI.” What is the difference? Seems I am missing some nuance?
The post @cutearguments asks about is "Recidivism and single-case probabilities", where I make an argument that single-case probabilities are not a special problem, even under the frequentist interpretation of probability; they only seem like a special problem because they make the reference class problem particularly salient.

So what does that have to do with confidence intervals?  Let me start with the example in my talk: suppose you are trying to estimate the average height of men in the U.S.  You collect a sample and generate an estimate, like 178 cm, and a 95% confidence interval, like (177, 179) cm.

Naively, it is tempting to say that there is a 95% chance that the true value (the actual average height of every male resident in the population) falls in the 95% confidence interval.  But that's not true.

There are two reasons you might hear for why it's not true:

1) The true value is unknown, but it is not a random quantity, so it is either in the interval or it's not.  You can't assign a probability to it.

2) The 95% confidence interval does not have a 95% chance of containing the true value because that's just not what it means.  A confidence interval quantifies variability due to random sampling; that's all.

The first argument is bogus; the second is valid.

If you are a Bayesian, the first argument is bogus because it is entirely unproblematic to make probability statements about unknown quantities, whether they are considered random or not.

If you are a frequentist, the first argument is still bogus because even if the true value is not a random quantity, the confidence interval is.  And furthermore, it belongs to a natural reference class, the set of confidence intervals we would get by running the experiment many times.  If we agree to treat it as a member of that reference class, we should have no problem giving it a probability of containing the true value.

But that probability is not 95%.   If you want an interval with a 95% chance of containing the true value, you need a Bayesian credible interval.

Friday, June 1, 2018

Bayesian Zig Zag

Almost two years ago, I had the pleasure of speaking at the inaugural meeting of the Boston Bayesians, where I presented "Bayesian Bandits from Scratch" (the notebook for that talk is here).  Since then, the group has flourished, thanks to the organizers, Jordi Diaz and Colin Carroll.

Last night I made my triumphant return for the 21st meeting, where I presented a talk I called "Bayesian Zig Zag".  Here's the abstract:
Tools like PyMC and Stan make it easy to implement probabilistic models, but getting started can be challenging.  In this talk, I present a strategy for simultaneously developing and implementing probabilistic models by alternating between forward and inverse probabilities and between grid algorithms and MCMC.  This process helps developers validate modeling decisions and verify their implementation. 
As an example, I will use a version of the "Boston Bruins problem", which I presented in Think Bayes, updated for the 2017-18 season. I will also present and request comments on my plans for the second edition of Think Bayes.
When I wrote the abstract, I was confident that the Bruins would be in the Stanley Cup final, but that is not how it worked out.  I adapted, using results from the first two games of the NHL final series to generate predictions for the next game.

Here are the slides from the talk:



And here is the Jupyter notebook I presented.  If you want to follow along, you'll see that there is a slide that introduces each section of the notebook, and then you can read the details.  If you have a Python installation with PyMC, you can download the notebook from the repository and try it out.

The talk starts with basic material that should be accessible for beginners, and ends with a hierarchical Bayesian model of a Poisson process, so it covers a lot of ground!  I hope you find it useful.

For people who were there, thank you for coming (all the way from Australia!), and thanks for the questions, comments, and conversation.  Thanks again to Jordi and Colin for organizing, to WeWork for hosting, and to QuantumBlack for sponsoring.

Thursday, May 3, 2018

Some people hate custom libraries

For most of my books, I provide a Python module that defines the functions and objects I use in the book.  That makes some people angry.

The following Amazon review does a nice job of summarizing the objections, and it demonstrates the surprising passion this issue evokes:





March 29, 2018
Format: Paperback
Echoing another reviewer, the custom code requirement means you learn their custom code rather than, you know, the standard modules numpy and scipy. For example, at least four separate classes are required, representing hundreds of lines of code, are required just to execute the first six lines of code in the book. All those lines do is define two signals, a cosine and a sine, sums them, then plots them. This, infuriatingly, hides some basic steps. Here's how you can create a cosine wave with frequency 440Hz:

duration = 0.5
framerate = 11025
n = round(duration*framerate)
ts = np.arange(n)/framerate
amp = 1.0
freq = 440
offset = 0.0
cos_sig = amp * numpy.cos( 2*numpy.pi*ts*freq + offset)
freq = 880
sin_sig = amp * numpy.sin( 2*numpy.pi*ts*freq + offset)

Instead, these clowns have

cos_sig = thinkdsp.CosSignal(freq=440,amp=1.0,offset=0)
sin_sig = thinkdsp.SinSignal(freq=440,amp=1.0,offset=0)
mix = cos_sig + sin_sig

where CosSignal and SinSignal are custom classes, not functions, which inherits four separate classes, NONE of which are necessary, and all of which serve to make things more complex than necessary, on the pretense this makes things easier. The classes these class inherit are a generic Sinusoid and SumSignal classes, which inherits a Signal class, which depends on a Wave class, which performs plotting using pyplot in matplotlib. None of which make anything really any easier, but does serve to hide a lot of basic functionality, like hiding how to use numpy, matplotlib, and pyplot.

In short, just to get through the first two pages, you have to have access to github to import their ridiculous thinkdsp, thinkplot, and thinkstats, totalling around 5500 lines of code, or you are just screwed and can't use this book. All decent teaching books develops code you need as necessary and do NOT require half a dozen files with thousands of lines of custom code just to get to page 2. What kind of clown does this when trying to write a book to show how to do basic signal processing? Someone not interested in teaching you DSP, but trying to show off their subpar programming skills by adding unnecessary complexity (a sure sign of a basic programmer, not a good).

The authors openly admit their custom code is nothing more than wrappers in numpy and scipy, so the authors KNEW they were writing a crappy book and filling it with a LOT of unnecessary complexity. Bad code is bad code. Using bad code to teach makes bad teaching. It's obvious Allen B. Downey has spent his career in academia, where writing quality code doesn't matter.



Well, at least he spelled my name right.

Maybe I should explain why I think it's a good idea to provide a custom library along with a book like Think DSP.  Importantly, the goal of the book is to help people learn the core ideas of signal processing; the software is a means to this end.

Here's what I said in the preface:
The premise of this book is that if you know how to program, you can use that skill to learn other things, and have fun doing it. 
With a programming-based approach, I can present the most important ideas right away. By the end of the first chapter, you can analyze sound recordings and other signals, and generate new sounds. Each chapter introduces a new technique and an application you can apply to real signals. At each step you learn how to use a technique first, and then how it works.
For example, in the first chapter, I introduce two objects defined in thinkdsp.py: Wave and Spectrum.  Wave provides a method called make_spectrum that creates a Spectrum object, and Spectrum provides make_wave, which creates a Wave.

When readers use these objects and methods, they are implicitly learning one of the fundamental ideas of signal processing: that a Wave and its Spectrum are equivalent representations of the same information -- given one, you can always compute the other.

This example demonstrates one reason I use custom libraries in my books: The API is the lesson.  As you learn about these objects and how they interact, you are also learning the core ideas of the topic.

Another reason I think these libraries are a good idea is that they let me introduce ideas top-down: that is, I can show what a method does -- and why it is useful -- first; then I can present details when they necessary or most useful.

For example, I introduce the Spectrum object in Chapter 1.  I use it to apply a low pass filter, and the reader can hear what that sounds like.  You can too, by running the Chapter 1 notebook on Binder.

In Chapter 2, I reveal that my make_spectrum function is a thin wrapper on two NumPy functions, and present the source code:

from np.fft import rfft, rfftfreq

# class Wave:
    def make_spectrum(self):
        n = len(self.ys)
        d = 1 / self.framerate

        hs = rfft(self.ys)
        fs = rfftfreq(n, d)

        return Spectrum(hs, fs, self.framerate)

At this point, anyone who prefers to use NumPy directly, rather than my wrappers, knows how.

In Chapter 7, I unwrap one more layer and show how the FFT algorithm works.  Why Chapter 7?  Because I introduce correlation in Chapter 5, which helps me explain the Discrete Cosine Transform in Chapter 6, which helps me explain the Discrete Fourier Transform.

Using custom libraries lets me organize the material in the way I think works best, based on my experience working with students and seeing how they learn.

This example demonstrates another benefit of defining my own objects: data encapsulation.  When you use NumPy's rfft to compute a spectrum, you get an array of amplitudes, but not the frequencies they correspond to.  You can call rfftfreq to get the frequencies, and that's fine, but now you have two arrays that represent one spectrum.  Wouldn't it be nice to wrap them up in an object?  That's what a Spectrum object is.

Finally, I think these examples demonstrate good software engineering practice, particularly bottom-up design.  When you work with libraries like NumPy, it is common and generally considered a good idea to define functions and objects that encapsulate data, hide details, eliminate repeated code, and create new abstractions.  Paul Graham wrote about this idea in one of his essays on software:
[...] you don't just write your program down toward the language, you also build the language up toward your program. [...] the boundary between language and program is drawn and redrawn, until eventually it comes to rest along [...] the natural frontiers of your problem. In the end your program will look as if the language had been designed for it.
That's why, in the example that makes my correspondent so angry, it takes just three lines to create and add the signals; and more importantly, those lines contain exactly the information relevant to the operations and no more.  I think that's good quality code.

In summary, I provide custom libraries for my books because:

1) They demonstrate good software engineering practice, including bottom-up design and data encapsulation.

2) They let me present ideas top-down, showing how they are used before how they are implemented.

3) And as readers learn the APIs I defined, they are implicitly learning the key ideas.

I understand that not everyone agrees with this design decision, and maybe it doesn't work for everyone.  But I am still surprised that it makes people so angry.



Wednesday, April 18, 2018

Computing at Olin Q&A


I was recently interviewed by Sally Phelps, the Director of Postgraduate Planning at Olin.  We talked about computer science at Olin, which is something we are often asked to explain to prospective students and their parents, employers, and other external audiences.

Afterward, I wrote the following approximation of our conversation, which I have edited to be much more coherent than what I actually said.

I should note: My answers to the following questions are my opinions.  I believe that other Olin professors who teach software classes would say similar things, but I am sure we would not all say the same things.

Photo Credit: Sarah Deng

Q: What is the philosophy of Olin when it comes to training software engineers of the future?

To understand computer science at Olin, you have to understand that Olin really has one curriculum, and it's engineering.

We have degrees in Engineering, Mechanical Engineering, and Electrical and Computer Engineering.  But everyone sees the same approach to engineering: it starts with people and it ends with people.  That means you can't wait for someone to hand you a well-formulated problem from a textbook; you have to understand the people you are designing for, and the context of the problem.  You have to know when an engineering solution can help and when it might not.  And then when you have a solution, you have to be able to get it out of the lab and into the world.


Q: That sounds very different from a traditional computer science degree.  

It is.  Because we already have a lot of computer scientists who know how data structures work; we don't have as many who can identify opportunities, work on open-ended problems, work on teams with people from other disciplines, work on solutions that might involve electrical and mechanical systems as well as software.

And we don't have a lot of computer scientists who can communicate clearly about their work; to have impact, they have to be able to explain the value of what they are doing.  Most computer science programs don't teach those things very well.

Also most CS programs don't do a great job of preparing students to work as software engineers.  A lot of classes are too theoretical, too mathematical, and too focused on the computer itself, not the things you want to do with it, the applications.

At Olin, we've got some theory, some mathematical foundations, some focus on the design of software systems.  But we've turned that dial down because the truth is that a lot of that material is not relevant to practice.  I always get a fight when I say that, because you can never take anything out of the curriculum.  There's always someone who says you have to know how to balance a red-black tree or you can't be a computer scientist; or you have to know about Belady's anomaly, or you have to know X, Y, and Z.

Well, you don't.  For the vast majority of our students, for all the things they are going to do, a big chunk of the traditional curriculum is irrelevant.  So we look at the traditional curriculum with some skepticism, and we make cuts.

We have to; there's only so much time.  In four years, students take about 32 classes.  We have to spend them wisely.  We have to think about where they are going after graduation.  Some will go to grad school, some will start companies, some will work in industry,  Some of them will be software engineers, some will be product managers, some will work in other fields; they might develop software, or work with software developers. 


Q: So how do you prepare people for all of that?

It depends what "prepare" means.  If it means teach them everything they need to know, it's impossible.  But you can identify the knowledge, skills and attitudes they are most likely to need.

It helps if you have faculty with industry experience.  A lot of professors go straight to grad school and straight into academics, and then they have long arguments about what software engineers need to know.  Sometimes they don't know what they are talking about.

If you're designing a curriculum, just like a good engineer, you have to understand the people you're designing for and the context of the solution.  Who are your students, where are they going, and what are they going to need?  Then you can decide what to teach.


Q: So if a student is interested in CS and they're deciding between Olin and another school, what do you tell them?

I usually tell them about the Olin curriculum, what I just explained.  And I suggest they look at our graduation requirements.  Students at Olin who do the Engineering major with a concentration in computing, they take a relatively small number of computer science classes, usually around seven.  And they take a lot of other engineering classes.

In the first semester, everyone takes the same three engineering classes, so everyone does some mechanical design, some circuits and measurement, and some computational modeling.

Everyone takes a foundation class in humanities, and another in entrepreneurship.  Everyone takes Principles of Engineering, where they design and build a mechatronic system.

In the fourth semester everyone takes user-centric design, and finally, in the senior year, everyone does a two-semester engineering capstone, which is usually interdisciplinary.

If a prospective student looks at those classes and they're excited about doing design and engineering -- and several kinds of engineering -- along with computer science, then Olin is probably a good choice for them.

If they look at the requirements and they dread them -- if the requirements are preventing them from doing what they really want -- then maybe Olin's not the right place.


Q: I understand there are student-taught software classes – can you tell us more about that?

We do, and a lot of them have been related to software, because that's an area where we have students doing internships, and sometimes starting companies, and they get a lot of industry experience.

And they come back with skills and knowledge they can share with their peers.  Sometimes that happens in classes, especially on projects.  But it can also be a student-led class where student instructors propose a class, and they they work with faculty advisors to develop and present the material.  As an advisor, I can help with curriculum design and the pedagogy, and sometimes I have a good view of the context or the big picture.  And then a lot of times the students have a better view of the details.  They've spent the summer working in a particular domain, or with a particular technology, and they can help their peers get a jump start.

They also bring some of the skills and attitudes of software engineering.  For example, we teach students about testing, and version control, and code quality.  But in a class it can be artificial; a lot of times students want to get code working and they have to move on to the next thing.  They don't want to hear from me about coding "style".

It can be more effective when it's coming from peers, and when it's based on industry experience.  The student instructors might say they worked at Pivotal, and they had to do pair programming, or they worked at Google, and all of their code was reviewed for readability before they could check it in.  Sometimes that's got more credibility than I do.


Q: What does the future look like for computing at Olin?

A big part of it is programming in context.  For example, the first software class is Modeling and Simulation, which is about computational models in science, including physics, chemistry, medicine, ecology…  So right from the beginning, we're not just learning to program, we're applying it to real world problems.

Programming isn't just a way of translation well understood solutions into code.  It's a way of communicating, teaching, learning, and thinking.  Students with basic programming skills can use coding as a "pedagogic lever" to learn other topics in engineering, math, natural and social science, arts and humanities.

I think we are only starting to figure out what that looks like.  We have some classes that use computation in these ways, but I think there are a lot more opportunities.  A lot of ideas that we teach mathematically, we could be doing computationally, maybe in addition to, or maybe instead of the math.

One of my examples is signal processing, where probably the most important idea is the Fourier transform.  If you do that mathematically, you have to start with complex numbers and work your way up.  It takes a long time before you get to anything interesting.

With a computational approach, I can give you a program on the first day to compute the Fourier transform, and you can use it, and apply it to real problems, and see what it does, and run experiments and listen to the results, all on day one.  And now that you know what it's good for, maybe you'll want to know how it works.  So we can go top-down, start with applications, and then open the hood and look at the engine.

I'd like to see us apply this approach throughout the curriculum, especially engineering, math, and science, but also arts, humanities and social science.