Probably Overthinking It: March 2011

Thursday, March 31, 2011

Freshman hordes more godless than ever!

Since 1978 the percentage of college freshmen whose religious preference is "None" has nearly tripled from 8% to 23%, and the trend is accelerating.

In 2007 I wrote this article for Free Inquiry magazine that reports on survey results from the Cooperative Institutional Research Program (CIRP) of the Higher Education Research Insitute (HERI). Here is some background on the survey, from my article:

The CIRP survey includes questions about students’ backgrounds, activities, and attitudes. In one question, students were asked their “current religious preference” and given a choice of seventeen common religions and Christian denominations, “Other Christian,” “Other religion,” or “None.” Another question asked students how often they “attended a religious service” in the last year. The choices were “Frequently,” “Occasionally,” and “Not at all.” The instructions directed students to select “Occasionally” if they attended one or more times, so a nonobservant student who attended a wedding and a funeral (and follows instructions) would not be counted among the apostates.

The following figure shows students' responses over the history of the survey (updated with the most recent data):

Here's what I said about this figure 4 years ago:

The number of students with no religious preference has been increasing steadily since the low point ... in 1978. ... The rate of growth from 1980 to the present has been between 0.25 and 0.35 percentage points per year... Since 1997, the rate may have increased to 0.6 or 0.7 percentage points per year. At that rate, the class of 2056 will have an atheist majority.

A linear extrapolation from data like this is mostly ridiculous, but as it turns out the next four points are pretty much in line. And finally, I claimed:

Both curves show a possible acceleration between 2005 and 2006. This jump may be due to increased visibility of atheism following the publication of books by Sam Harris, Daniel C. Dennett, and Richard Dawkins.

That last point is pure speculation on my part, but it is also what you might call a testable hypothesis. Which gives me an excuse to talk about another article of mine, "A novel changepoint detection algorithm," which you can download from arXiv. Here's the abstract:

We [my imaginary co-author and I] propose an algorithm for simultaneously detecting and locating changepoints in a time series, .... The kernel of the algorithm is a system of equations that computes, for each index i, the probability that the last (most recent) change point occurred at i.

In a time series, a changepoint is a time where the behavior of the system changes abruptly. By applying my algorithm to the CIRP data, we can test whether there are changepoints and when they are likely to have occurred.

My conjecture is that the rate of increase changed in 1997 and maybe again in 2006. Since this is a hypothesis about rates, I'll start by computing differences between successive elements as an estimate of the first derivative. Where there is missing data, I use the average yearly change. This figure shows the result:

As usual, taking differences amplifies noise and makes it harder to see patterns. But that's exactly what my algorithm (which is mine) is good for. Here are the results:

The y-axis is the probability that the last (most recent) change point occurred during a given year, accumulated from right to left. The gray lines indicate years with a relatively high probability of being the last changepoint. So, reading from right to left, there is a 5% chance of a changepoint in 2006 and a 5% chance for 1998. The most likely location of the last changepoint is 1984 (about 20%) or 1975 (25%). So that provides little if any evidence for my conjecture, which is pretty much what I deserve.

A simpler, and more likely, hypothesis is that the trend is accelerating; that is, the slope is changing continuously, not abruptly. And that's easy to test by fitting a line to the yearly changes.

The red line shows the linear least squares fit, with slope 0.033; the p-value (chance of seeing an absolute slope as big as that) is 0.006, so you can either reject the null hypothesis or update your subjective degree of belief accordingly.

The fitted value for the current rate of growth is 0.9 percentage points per year, accelerating at 0.033 percentage points per year^2. So here's my prediction: in 2011 the percentage of freshman who report no religious affiliation will be 23.0 + 0.9 + 0.03 = 23.9%.

Wednesday, March 23, 2011

Predicting marathon times

I ran the New Bedford Half Marathon on Sunday in 1:34:08, which is a personal record for me. Like a lot of runners in New England, I ran New Bedford as a training run for the Boston Marathon, coming up in about 4 weeks.

In addition to the training, the half marathon is also useful for predicting your marathon time. There are online calculators where you can type in a recent race time and predict your time at different distances. Most of them are based on the Riegel formula; here is the explanation from Runners' World:

The Distance Finish Times calculator ... uses the formula T2 = T1 x (D2/D1)^1.06 where T1 is the given time, D1 is the given distance, D2 is the distance to predict a time for, and T2 is the calculated time for D2.

The formula was developed by Pete Riegel and published first in a slightly different form in Runner's World, August 1977, in an article in that issue entitled "Time Predicting." The formula was refined for other sports (swimming, bicycling, walking,) in an article "Athletic Records and Human Endurance," also written by Pete Riegel, which appeared in American Scientist, May-June 1981.

Based on my half marathon, the formula says I should be able to run a marathon in 3:16:09. Other predictors use different parameters or different formulas, but even the most conservative prediction is under 3:20, which just happens to be my qualifying time. So I should be able to qualify, right?

There a few caveats. First, you have to train for the distance. If you have never run farther than 13.1 miles, you will have a hard time with the marathon, no matter what your half marathon time is. For me, this base is covered. I have been following the FIRST training program since January, including several runs over 20 miles (and another coming up on Sunday).

Second, weather is a factor. New Bedford this weekend was 40 degF and sunny, which is pretty close to optimal. But if Marathon Monday is 80 degF, no one is going to hit their target time.

Finally, terrain is a factor. Boston is a net-downhill course, which would normally make it fast, but with the hills, especially in miles 17-21, Boston is considered a tough course.

So that raises my question-of-the-week: how well does New Bedford predict Boston?

The 2010 results from New Bedford are here, and they are available in an easy-to-parse format. Results from Boston are here, but you can't download the whole thing; you have to search by name, etc. So I wrote a program that loops through the New Bedford results and searches for someone in Boston with the same name and age (or age+1 for anyone born under Aries).

I found 520 people who ran both races and I matched up their times. This scatter plot shows the results:

Not surprisingly, the results are correlated: R^2 is 0.86. There are a few outliers, which might be different people with the same name and age, or people who had one bad day and one good day.

Now let's fit a curve. Taking the log of both sides of the Riegel formula, we get

log(T2) = log(T1) + A log(2)

So if we plot T2 vs T1 on a log-log scale, we should see a line with slope 1, and we can estimate the intercept. This figure shows the result:

The fitted line has slope 1.0 and the intercept that minimizes least squared error. The estimated intercept corresponds to an exponent in the Riegel model of 1.16, substantially higher than the value used by the race time predictors (1.06).

Visually, the line is not a particularly good fit for the data, which suggests that this model is not capturing the shape of the relationship. We could certainly look for better models, but instead let's apply Downey's Law, which says "The vast majority of statistical questions in the real world can be answered by counting."

What are my chances of qualifying for Boston? Let's zoom in on the people who finished near me in New Bedford (minus one year):

The red line isn't a fitted curve; it's the 3:20 line. Of 50 people who ran my time in New Bedford (plus or minute two minutes) only 13 ran a 3:20 or better in Boston. So my chances are about 25%. Or maybe less -- most of those 13 were faster than me.

Why are the race predictors so optimistic? One possibility is weather. If Boston was hot last year, that would slow people down. But according to Wolfram|Alpha, New Bedford on March 21, 2010 was 55-60 degF and sunny. And Boston on April 18 was 40-50 degF and cloudy. So if anything we might expect times in Boston to be slightly faster.

The most likely explanation (or at least the only one left) is terrain. Curse you, Heartbreak Hill!

-----

Update March 24, 2011: Here's one more figure showing percentiles of marathon times as a function of half marathon time. The blue line is at 94 minutes; for that time the percentiles are 192, 200, 207, 216 and 239 minutes. So the median time for people in my cohort is 3:27:00.

Update April 19, 2011: I ran with a group of friends aiming for a 3:25 pace. We hit the halfway mark in 1:45, so we revised our goal to 3:30. But I thrashed my quads in the first half and finished in 3:45, pretty close to that dotted blue line. Still, it was an exciting day and a lot of fun, for some definition of fun.

Update May 17, 2011: It turns out I don't get credit for Downey's Law; Francis Galton beat me to it. And to make matters worse, he said it better: "Whenever you can, count."

If you find this sort of thing interesting, you might like my free statistics textbook, Think Stats. You can download it or read it at thinkstats.com.

Wednesday, March 2, 2011

BQ is unfair to women

Last week I wrote about the changes the BAA is making in the qualifying times for the Boston Marathon. Based on a sample of qualifiers from the Chicago Marathon, I predicted that the proportion of women in the open division (ages 18-34) will drop in the next two years.

This raises an obvious question: are the new standards fair? BAA executive director Tom Grilk explained “Looking back at the data that we have... we found that the fairest way to deal with this is to have a uniform reduction in qualifying standards across the board."

But he didn’t explain what he means by “fair.” There are several possibilities.

1) E-fairness (E for elite): By this definition, a standard is fair if the gender gap for qualifiers is the same as for elite runners. I discussed this standard in the previous post, and showed two problems: (a) elite women are farther from the pack than elite men, so qualifying times would be determined by a small number of outliers; and as a result, (b) this standard would disqualify 47% of the women in the open division.

A variation of E-fairness uses the relative difference in speeds rather than the absolute difference in times. This option reduces the impact, but doesn’t address what I think is the basic problem: it doesn’t make sense to base qualifying times on the performance of elite runners.

2) R-fairness (R for representative): By this definition a standard is fair if the qualifiers are a representative sample of the population of marathoners. I don’t have good data to evaluate R-fairness for the open division, but for the field as a whole the current standard is R-fair: according to running usa, 41% of marathoners are female, and in 2010 42% of Boston Marathon finishers were female.

[Note: this article in the Wall Street Journal claims that 42% is "higher than the percentage of all U.S. marathoners who are women," but I don't know what they are basing that on.]

A problem with R-fairness is that the population of marathoners includes some people who are competitive racers and others who are... not competitive racers. I don’t think it makes sense for the middle and the back of the pack to affect the standard.

3) C-fairness (C for contenders): Qualifying times should be determined by the most relevant population, runners who finish close to the standard. Specifically, I define a “contender” as someone who finishes within X minutes of the standard, where X is something like 20 minutes (we’ll look at some different values for X and see that it doesn’t matter very much).

And here’s what I propose: a standard is C-fair if the percentage of contenders who qualify is the same in each group. As an example, I'll compute a fair standard for men and women in the open division. Here’s how:

1) Like last week, I use data from the last three Chicago marathons as a sample of the population of marathoners. [Note: If this sample is not representative, that will affect my results, so I would like to get a more comprehensive dataset. I contacted marathonguide.com, but have not heard back.]

2) For the current male standard, 3:10, I select runners who finish within X minutes of the standard, and compute the percentage of these contenders that qualify.

3) Then I search for the female standard that yields the same percentage of qualifiers.

This figure shows the results:

The x-axis is the gender gap: the difference in minutes between male and female qualifying times. The y-axis is the difference in the percentage of contenders who qualify. The lines show results for values of X from 20 to 40 minutes. For smaller X, the results are noisier.

Where the lines cross through 0 is the gap that is C-fair. By that definition, the gap should be about 38 minutes. So if the male standard is 3:10, the female standard should be 3:48.

In 2013 the male standard will be 3:05. In that case, based on the same analysis, the gap should be 34 minutes, so the female standard should be 3:39.

C-fairness also has the property of “equal marginal impact,” which means that if we tighten the standard by 1 minute, we disqualify the same percentage of runners in all groups, which leaves the demographics of the field unchanged. Last week we saw that the current standard does not have this property -- tightening the qualifying times has a disproportionate impact on women.

In summary:

1) I think qualifying times should be based on the population of contenders -- runners near the standard -- not on the elites or the back of the pack.

2) A standard is fair if it qualifies the same proportion of contenders from each group.

3) By that definition, the gender gap in the open division should be 38 minutes in 2012 and 34 minutes in 2013.

4) The common belief that the standard for women is too easy is mistaken; by the definition of fair that I think is most appropriate, the standard for women is relatively hard.

-----

If you find this sort of thing interesting, you might like my free statistics textbook, Think Stats. You can download it or read it at thinkstats.com.