Olin College is Hiring

Olin College is Hiring. I teach at Olin College, a new undergraduate engineering college with the mission to fix engineering education. If you're interested in joining our team, here is information about the Faculty Search at Olin College.

Wednesday, March 23, 2011

Predicting marathon times

I ran the New Bedford Half Marathon on Sunday in 1:34:08, which is a personal record for me.  Like a lot of runners in New England, I ran New Bedford as a training run for the Boston Marathon, coming up in about 4 weeks.

In addition to the training, the half marathon is also useful for predicting your marathon time.  There are online calculators where you can type in a recent race time and predict your time at different distances.  Most of them are based on the Riegel formula; here is the explanation from Runners' World:

The Distance Finish Times calculator ... uses the formula T2 = T1 x (D2/D1)^1.06 where T1 is the given time, D1 is the given distance, D2 is the distance to predict a time for, and T2 is the calculated time for D2. 
The formula was developed by Pete Riegel and published first in a slightly different form in Runner's World, August 1977, in an article in that issue entitled "Time Predicting." The formula was refined for other sports (swimming, bicycling, walking,) in an article "Athletic Records and Human Endurance," also written by Pete Riegel, which appeared in American Scientist, May-June 1981.
Based on my half marathon, the formula says I should be able to run a marathon in 3:16:09.  Other predictors use different parameters or different formulas, but even the most conservative prediction is under 3:20, which just happens to be my qualifying time.  So I should be able to qualify, right?

There a few caveats.  First, you have to train for the distance.  If you have never run farther than 13.1 miles, you will have a hard time with the marathon, no matter what your half marathon time is.  For me, this base is covered.  I have been following the FIRST training program since January, including several runs over 20 miles (and another coming up on Sunday).

Second, weather is a factor.  New Bedford this weekend was 40 degF and sunny, which is pretty close to optimal.  But if Marathon Monday is 80 degF, no one is going to hit their target time.

Finally, terrain is a factor.  Boston is a net-downhill course, which would normally make it fast, but with the hills, especially in miles 17-21, Boston is considered a tough course.

So that raises my question-of-the-week: how well does New Bedford predict Boston?

The 2010 results from New Bedford are here, and they are available in an easy-to-parse format.  Results from Boston are here, but you can't download the whole thing; you have to search by name, etc.  So I wrote a program that loops through the New Bedford results and searches for someone in Boston with the same name and age (or age+1 for anyone born under Aries).

I found 520 people who ran both races and I matched up their times.  This scatter plot shows the results:

Not surprisingly, the results are correlated: R^2 is 0.86.  There are a few outliers, which might be different people with the same name and age, or people who had one bad day and one good day.

Now let's fit a curve.  Taking the log of both sides of the Riegel formula, we get

log(T2) = log(T1) + A log(2)

So if we plot T2 vs T1 on a log-log scale, we should see a line with slope 1, and we can estimate the intercept.  This figure shows the result:

The fitted line has slope 1.0 and the intercept that minimizes least squared error.  The estimated intercept corresponds to an exponent in the Riegel model of 1.16, substantially higher than the value used by the race time predictors (1.06).

Visually, the line is not a particularly good fit for the data, which suggests that this model is not capturing the shape of the relationship.  We could certainly look for better models, but instead let's apply Downey's Law, which says "The vast majority of statistical questions in the real world can be answered by counting."

What are my chances of qualifying for Boston?  Let's zoom in on the people who finished near me in New Bedford (minus one year):

The red line isn't a fitted curve; it's the 3:20 line.  Of 50 people who ran my time in New Bedford (plus or minute two minutes) only 13 ran a 3:20 or better in Boston.  So my chances are about 25%.  Or maybe less -- most of those 13 were faster than me.

Why are the race predictors so optimistic?  One possibility is weather.  If Boston was hot last year, that would slow people down.  But according to Wolfram|Alpha, New Bedford on March 21, 2010 was 55-60 degF and sunny.  And Boston on April 18 was 40-50 degF and cloudy.  So if anything we might expect times in Boston to be slightly faster.

The most likely explanation (or at least the only one left) is terrain.  Curse you, Heartbreak Hill!


Update March 24, 2011:  Here's one more figure showing percentiles of marathon times as a function of half marathon time.  The blue line is at 94 minutes; for that time the percentiles are 192, 200, 207, 216 and 239 minutes.  So the median time for people in my cohort is 3:27:00.

Update April 19, 2011: I ran with a group of friends aiming for a 3:25 pace.  We hit the halfway mark in 1:45, so we revised our goal to 3:30.  But I thrashed my quads in the first half and finished in 3:45, pretty close to that dotted blue line.  Still, it was an exciting day and a lot of fun, for some definition of fun.

Update May 17, 2011: It turns out I don't get credit for Downey's Law; Francis Galton beat me to it.  And to make matters worse, he said it better: "Whenever you can, count."

If you find this sort of thing interesting, you might like my free statistics textbook, Think Stats. You can download it or read it at thinkstats.com.

1 comment:

  1. You post makes me think you might be interested in my new book. Overthinking the Marathon is like having me as your partner for a season of training, 17 weeks that culminate in the 2012 Cape Cod Marathon. Some days I talk about the nitty-gritty details, other days, it's about the things that make running interesting and fun, even – no, especially – when it hurts. That included analysis of the data that all runners collect as they train.

    Amby Burfoot, 1968 Boston Marathon winner and Runner's World editor-at-large, says, "Ray Charbonneau insists he hasn't written a marathon guide, and he's right. Instead, he's loaning himself out as a thoughtful, veteran, and funny training partner. You couldn't find a better one as you get ready for your next 26.2-miler."

    If you're interested in reading Overthinking the Marathon, here's the link:

    Feel free to share the heck out of it :-)