Tuesday, July 10, 2012

Secularization in America: part six

Summary so far

In Part One I described trends in market share of major religions in the U.S.: since 1988, the fraction of Protestants dropped from 60% to 51%, and the fraction of people with no religious affiliation increased from 8% to 18%.

In Part Two I used data from the 1988 General Social Survey (GSS) to model transmission of religion from parent to child, and found that the model failed to predict the decrease in Protestants and the increase in Nones that occurred between 1988 and 2010.

In Part Three I looked at changes, between 1988 and 2008, in the spouse tables (which describe the tendencies of people to marry within their religions), the environment table (which describes parents' decisions about their children's religious upbringing), and the transmission table (which describes the likely outcomes for children raised within each religion).  I found that the transmission table has changed substantially since 1988, and accounts for a large part of the observed increase in Nones, but not the decrease in Protestants.

In Part Four I looked at changes in religiosity over the lifetime of respondents.  I tentatively concluded that the differences between generations were larger than changes in affiliation, within generations, over time.

But in Part Five I looked more closely and saw that all generations were becoming more religious, or staying the same, prior to 1990, and that all generations began to disaffiliate during the 1990s, continuing into the 2000s.

Generational Model

Now I am ready to get back to the generational model I have been working up to.  The goal of the generational model is to separate these three effects:

  1. Changes in religious preference from one generation to the next.
  2. Changes in religious affiliation over the lifetime of respondents.
  3. Changes in the composition of the GSS cohort over time.
The model works by simulation.  Assuming that we are starting in 1988, here are the steps:
  1. Read the survey data from 1988 and resample it.  Compute and store the distribution of ages.
  2. For each respondent, generate a hypothetical child.  Use the BirthModel to determine year of birth, the UpbringingModel to determine what religion the child is raised in, and the TransmissionModel to determine what affiliation the child will have as an adult.  Details of these models follow.
  3. Form a combined cohort of parents and simulated children.  Since the cohort of parents is a representative sample of the US population, the cohort of simulated children is a representative sample of the population one generation later (based, for now, on the simplifying assumptions that all groups have the same number of children on average, and there is no immigration).
  4. In order to generate a cohort from a future survey year, draw a sample from the combined cohort, weighted so that the distribution of ages in the future year is the same as the original distribution of ages in 1988.  As the simulation goes forward in time, this generated cohort contains fewer of the parents and more of the simulated children.  After 20 years, about 25% of the "real" respondents have been replaced with "fake" respondents.
Now, where do all these auxiliary models come from?

BirthModel: This is just the distribution of parent's age when each child is born.  It is based on data from the 1994 GSS, which includes questions about children.  I had to do some work to correct for an obvious bias due to the ages of the respondents; I will skip the details here.

UpbringingModel: This is a combination of the SpouseTable and the EnvironmentTable, described in Part Three.  It is a map from the parent's religion to the distribution of possible religions the child might be raised in.

TransmissionModel: This is the TransmissionTable described in Part Three.  It is a map from the religious environment of the child to the distribution of religious affiliation reported by the child as an adult.

The Upbringing and Transmission models come in two flavors:
Time invariant: We use all respondents to estimate the parameters of the model, and apply the same model to generate all simulated children.

Time variant: We estimate different parameters for each generation (partitioned by decade born) and use  different models to generate simulated children, depending on what year they are born.

For the time variant model, we have to extrapolate from observed data into the future.  To keep this simple we copy the latest reliable data (based on sample size) and apply it to people born in later decades.

Ok, that's enough methodology for now.  Let's take a look at some...


The first step is to validate the model by showing that it can predict the observed changes using past data.  Here I mean "predict" in a peculiar sense, which is that I will use the entire dataset (including data after 1988) to build the auxiliary models, then use the simulator to generate trends from 1988 to 2010.

Here is what the results look like:

The thick lines are the observed data; the thin lines are simulations.  Here are my observations:
  1. For Jews and Catholics, the observed data falls within the bounds of the simulations, so the model validates.
  2. For Other, the observed data sometimes exceeds the bounds of the simulations, which may be due to immigration (not included in this model).
  3. For None, the observed data is at the high end of the range, and for Prot it is at the low end of the range.  This is most likely due to the disaffiliation we saw in Part Five, which is only partly captured in this model.
I conclude that the model is capturing a large part of the observed changes since 1988, but of course I am cheating by using data from after 1988.  So these results validate my modeling decisions (what to include and what to leave out) but they don't test the predictive power of the model.

Predictive power

To make an honest test, we have to restrict ourselves to data from before 1988.  That way we can tell what part of the observed changes would have been predictable in 1988.

Here's what the result looks like:
So if we had used this model in 1988, we would have predicted a small decrease in the fraction of Protestants and a small increase in None, but we would have underestimated both trends.

This supports my conclusion in Part Five that something happened in the 1990s that changed trends in religious affiliation, and suggests that these changes were unpredictable based on data observable before 1988.


Finally, we can use all data to build the models, use 2010 as the starting place for the simulations, and make some predictions for the next 30 years:

So what should we expect?
  1. The decline in fraction of Protestants will continue.  The fraction of Catholics will also decrease, but more slowly.
  2. The fraction of Nones will increase, overtaking Catholics as the second-largest religious affiliation around 2030.
  3. The fraction of Others will increase slowly, about 1 percentage point in 30 years.  If immigration from Asia continues at current rates, that would add another percentage point, bringing the total to 6%.
  4. The fraction of Jews will decrease, possibly by half by 2040.
These predictions are likely to be conservative; that is, the rate of secularization will almost certainly be faster.  Why?
  1. Over the last several generations, the UpbringingModel and the TransmissionModel have changed substantially.  Parents are less likely to raise their children with religion, and those children are less likely to adopt the religion they are raised with.  The model captures these trends, but assumes that they will level off in 2010.  It would probably be more accurate to assume that they will continue.
  2. Rates of disaffiliation among adults are also increasing.  Again, the model includes trends that have already occurred, but it assumes that they will level off rather than continue.
So there are reasons to expect the fraction of Nones to accelerate.

Conversely, it is hard to imagine that the trends will be any slower than these predictions.  To a large extent, these results are not predictions about things that will happen in the future; rather, they are the future consequences of things that have already happened.  For example, in 2020, the GSS survey will include a cohort of people in their 40s.  What will they be like?  They will be a lot like the people in the 2010 survey who are in their 30s.  But they will be older.  Changes in the general population are slow because is takes a long time to replace each generation with the next; but as a result, they are also predictable.

Next time: Was Rick Santorum right?  Is college the #1 enemy of religious belief?  (Hint: no.)  I will look more closely at the TransmissionModel to see what factors make vertical transmission of religion more (or less) likely.


  1. What does "no religious affiliation" mean? I know what atheism and agnosticism mean. I even have a vague understand of what having a "spirituality" means. But "no religious affiliation" seems problematically vague. Maybe if I saw it alongside the other options it would make more sense.

    1. The GSS variable I used is called RELIG. The specific question was "What is your religious preference? Is it Protestant, Catholic, Jewish, some other religion, or no religion?" The disaffiliated are the people who answered "none."

      I used the word "affiliation" because it seems to be the most commonly used term in the related literature.

  2. I have a suggestion based on a comment you made in an interview: "It is hard to imagine what that factor might be." This comment was made in reference to the possibility of unknown factor affecting religious disaffiliation in tandem with increased internet usage. Well, I'm not sure if you've tried this yet, but how about Internet pornography? I realize this is a subcategory of Internet usage, but it is one that I think would be more strongly tied to religious activity than others, for probably many reasons that could be discussed. It's something to consider, and I'm sure the numbers are out there.

    The interview was here: http://www.technologyreview.com/view/526111/how-the-internet-is-taking-away-americas-religion/

    1. Hi James. This topic came up in a blog post so I did some quick checking. The GSS includes a question, XMOVIE, that asks if the respondent has watched an X-rated movie in the last year. It is, in fact, correlated with both Internet use and non-affiliation. But the fraction of people answering yes did not increase between 1990 and 2010, so it can't explain the change in affiliation during this interval.

      Here's the data: http://tinyurl.com/mvbwdf6

      I am surprised that there was no increase. It's possible that when people were asked about "X-rated movies" they did not think about/count online pornography. Or it's possible that the respondents are simultaneously watching more and responding less accurately. Or, if the responses are accurate, that suggests that online pornography is replacing previous formats, but not increasing the number of consumers. But with this dataset, that's as far as we can go.

  3. Hi Allen, I read that internet only accounts for 20% of the decrease of religion affilliation. Yet, Internet affects one's context of knowledge and this knowledge could be share with others. Could it be your model understates the effect the internet is having one's understanding of the world and existences? Or is it that you lack the data required to quality this effect?

    1. Hi Fred. It's hard to say. The GSS asks only a few questions about Internet use, and they don't ask much about what things people are using the Internet for.

      Your point, if I'm getting it right, is that the Internet affects so many other things that (1) even people who don't use the Internet are affected by it, and (2) measuring Internet use directly only captures part of the effect.

      But you're right, I don't have data to explore those indirect effects.

    2. Hi Allen,

      I am intrigued by your research on internet usage and religion and have a few questions. In the 2012 General Social Survey dataset, I am familiar with several variables related to internet usage. Some of these variables are binomial and others are interval-scaled such as the WWWHR variable.

      To help me understand your research, can you provide the model specification you used for your analysis? I would like to replicate the results and would like to see your beta weights and pseudo R-square values which you are basing your interpretation? You also indicated that you utilized logistic regression to perform your analysis. Could you provide a little more detail about your statistical methodology? Did you perform a multinomial logistics regression or ordinal regression?

      Thanks so much,

    3. Hi Larry,

      All of my code and the data are in this repository:


      You should be able to check it out and replicate my results easily, especially if you have a Python environment set up.

      The paper, with details about the methodology and the variables I used, is here:


      Let me know if you find anything interesting!


    4. Great, thanks Allen. I see your odds ratio but what was your overall pseudo r-squared?

  4. Allen, what about cell phone use? Is that available in the database? Isn't the temporal pattern about the same as internet use n terms of increasing use? Would be interesting to know if there is multicollinearity between cell phone use and internet use, or not.

    1. It does appear that cell phone use is contained in the General Social Survey. http://digitalcommons.iwu.edu/cgi/viewcontent.cgi?article=1152&context=respublica

    2. Interesting question. Do you have a theory about how cell phone use might cause disaffiliation?

  5. Grateful for your work here, Allen. I work at a church north of Boston, and this is incredibly helpful data and predictions about the seasons ahead for the Church. Thanks!

    1. Thanks -- glad to hear that you are finding it helpful.