Probably Overthinking It: July 2012

Wednesday, July 11, 2012

Secularization in America: part seven

Abstract

Based on 2000-2010 data from the General Social Survey (GSS), I present results of a logistic regression that measures the relationship between Internet use and religious affiliation, controlling for religious upbringing, income and socioeconomic index, year born (age), and education.

I find that moderate Internet use reduces the chance of religious affiliation by 2 percentage points (odds ratio 0.8); heavier Internet use reduces affiliation by an additional 5 percentage points (odds ratio 0.7). Four years of college reduces affiliation by an additional 2 percentage points (odds ratio 0.8).

All reported effects are statistically significant with N=8960 respondents.

Results of logistic regression can be difficult to interpret; it might help to imagine the following progression:

Start with a hypothetical baseline person raised in any religion, with moderate or high household income ($25,000 per year or more), born in 1960, with high school education but no college, and low Internet use (less than 2 hours per week): in the GSS survey, 91% of people in this category have a religious affiliation. Now we change one variable at a time.
If this person were born 10 years later (in 1970) the fraction would drop to 89%.
If this person went to college, the fraction would drop to 87%
If this person used the Internet 2 or more hours per week, the fraction would drop to 85%.
If this person used the Internet 8 or more hours per week, the fraction would drop to 80%.

Taken together, college education and Internet use are associated with a decrease in religious affiliation of 9 percentage points.

Introduction

From 1990 to 2010 the fraction of Protestants in the U.S. population dropped from 62% to 51%; at the same time the fraction of people with no religious preference increased from 8% to 18%. The following graph shows these trends:

In a previous article I presented evidence that something happened in the 1990s, continuing through the 2000s, that is causing disaffiliation from religion across all generations, with the largest effect on the youngest generations in the survey, people born in the 1960s and 1970s.

There are many possible explanations, but for me, the Internet pops to the top of this list. First, the timing is at least approximately right. Here is data from the World Bank, showing number of Internet users per hundred people in the U.S.

Internet use increased rapidly from 1995 to 2010, which is the interval of steepest change in religious affiliation.

Regressions

To identify factors that contribute to disaffiliation, I ran logistic regressions with the following dependent variable:

has_relig: 1 if the respondent reported any religious affiliation when interviewed as an adult, or 0 if the respondent reported "None" (based on the GSS variable RELIG)

And these explanatory variables:

had_relig: 1 if the respondent reported being raised in a religion, 0 otherwise (based on RELIG16)

born_from_1960: year the respondent was born minus 1960 (based on AGE and survey year). Subtracting 1960 makes it easier to interpret the results of the regression.

educ_from_12: number of years of school completed, minus 12 (based on EDUC).

somewww: 1 if the respondent reported using the Internet 2 of more hours per week, 0 otherwise (based on WWWHR, with the threshold chosen near the median)

heavywww: 1 if the respondent uses the Internet more than 8 hours per week, 0 otherwise (threshold chosen near the 75th percentile)

SEI: Socioeconomic index (a measure of occupational prestige developed by the GSS).

high_income: 1 if the respondent reports annual household income of $25,000 or more, which includes 62% of respondents who answered the question.

I used data from GSS survey years 2000, 2002, 2004, 2006, and 2010 (the relevant questions were not asked in 2008). I excluded respondents who were not asked or did not answer one or more of the questions I used in my analysis.

It turns out that SEI does not make a contribution that is either statistically or practically significant, so I omit it from the model.

Here are the results of the model as reported by R:

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.164434 0.094978 -1.731 0.0834 .

had_relig 2.318141 0.087372 26.532 < 2e-16 ***

high_income 0.166673 0.072345 2.304 0.0212 *

born_from_1960 -0.020161 0.002128 -9.474 < 2e-16 ***

educ_from_12 -0.051850 0.012228 -4.240 2.23e-05 ***

somewww -0.178409 0.078490 -2.273 0.0230 *

heavywww -0.336658 0.080546 -4.180 2.92e-05 ***

---

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 7860.3 on 8959 degrees of freedom

Residual deviance: 6872.5 on 8953 degrees of freedom

AIC: 6886.5

Number of Fisher Scoring iterations: 5

All explanatory variables are statistically significant: high_income and somewww are borderline, both at p=0.02.

The odds ratios and cumulative probabilities are:

odds cumulative

ratio probability

(Intercept) 0.85 46

had_relig 10.16 90

high_income 1.18 91

born_from_1960 0.82 89

educ_from_12 0.81 87

somewww 0.84 85

heavywww 0.71 80

These results are summarized and interpreted in the Abstract, above.

Discussion

As always, statistical association does not prove causation, but in this case I think there are reasons to believe that Internet use causes disaffiliation from religion:

It is easy to imagine how Internet use could allow a person in a homogeneous community to find information about people of other religions (and none), and to interact with them personally. And there is anecdotal evidence that those interactions contribute to religious disaffiliation (for example, numerous personal reports on reddit.com/r/atheism).
Conversely it is harder to imagine plausible reasons why disaffiliation might cause increased Internet use (except possibly on Sunday mornings).
Although it is possible that a third factor causes both disaffiliation and Internet use, that factor would also have to be new, coincidentally rising in prevalence, like the Internet, during the 1990s and 2000s.
Whatever causes disaffiliation has the strongest effect on the youngest generations, which is consistent with the hypothesis that Internet use during adolescence and young adulthood has the strongest effect on religious affiliation.

So with appropriate caution, I think there is a strong case here for causation, and not just statistical association.

Furthermore, the magnitude of the effect is large enough to explain a substantial part of the observed changes in religious affiliation. In my next article I will incorporate this regression model into the generational model I presented in Part Six, in order to estimate the effect of Internet use on these trends.

Summary of previous reports

In Part One I described trends in market share of major religions in the U.S.: since 1988, the fraction of Protestants dropped from 60% to 51%, and the fraction of people with no religious affiliation increased from 8% to 18%.

In Part Two I used data from the 1988 General Social Survey (GSS) to model transmission of religion from parent to child, and found that the model failed to predict the decrease in Protestants and the increase in Nones that occurred between 1988 and 2010.

In Part Three I looked at changes, between 1988 and 2008, in the spouse tables (which describe the tendencies of people to marry within their religions), the environment table (which describes parents' decisions about their children's religious upbringing), and the transmission table (which describes the likely outcomes for children raised within each religion). I found that the transmission table has changed substantially since 1988, and accounts for a large part of the observed increase in Nones, but not the decrease in Protestants.

In Part Four I looked at changes in religiosity over the lifetime of respondents. I tentatively concluded that the differences between generations were larger than changes in affiliation, within generations, over time.

But in Part Five I looked more closely and saw that all generations were becoming more religious, or staying the same, prior to 1990, and that all generations began to disaffiliate during the 1990s, continuing into the 2000s.

In Part Six I presented a generational model that retroactively "predicts" the changes we have seen since 1988, and used it to predict how those changes are likely to continue in the next 30 years. I expect the fraction of Protestants to continue to decrease, and the fraction of Nones to increase and overtake Catholic as the second-largest affiliation by 2030.

Tuesday, July 10, 2012

Secularization in America: part six

Summary so far

Generational Model

Now I am ready to get back to the generational model I have been working up to. The goal of the generational model is to separate these three effects:

Changes in religious preference from one generation to the next.
Changes in religious affiliation over the lifetime of respondents.
Changes in the composition of the GSS cohort over time.

The model works by simulation. Assuming that we are starting in 1988, here are the steps:

Read the survey data from 1988 and resample it. Compute and store the distribution of ages.
For each respondent, generate a hypothetical child. Use the BirthModel to determine year of birth, the UpbringingModel to determine what religion the child is raised in, and the TransmissionModel to determine what affiliation the child will have as an adult. Details of these models follow.
Form a combined cohort of parents and simulated children. Since the cohort of parents is a representative sample of the US population, the cohort of simulated children is a representative sample of the population one generation later (based, for now, on the simplifying assumptions that all groups have the same number of children on average, and there is no immigration).
In order to generate a cohort from a future survey year, draw a sample from the combined cohort, weighted so that the distribution of ages in the future year is the same as the original distribution of ages in 1988. As the simulation goes forward in time, this generated cohort contains fewer of the parents and more of the simulated children. After 20 years, about 25% of the "real" respondents have been replaced with "fake" respondents.

Now, where do all these auxiliary models come from?

BirthModel: This is just the distribution of parent's age when each child is born. It is based on data from the 1994 GSS, which includes questions about children. I had to do some work to correct for an obvious bias due to the ages of the respondents; I will skip the details here.

UpbringingModel: This is a combination of the SpouseTable and the EnvironmentTable, described in Part Three. It is a map from the parent's religion to the distribution of possible religions the child might be raised in.

TransmissionModel: This is the TransmissionTable described in Part Three. It is a map from the religious environment of the child to the distribution of religious affiliation reported by the child as an adult.

The Upbringing and Transmission models come in two flavors:

Time invariant: We use all respondents to estimate the parameters of the model, and apply the same model to generate all simulated children.

Time variant: We estimate different parameters for each generation (partitioned by decade born) and use different models to generate simulated children, depending on what year they are born.

For the time variant model, we have to extrapolate from observed data into the future. To keep this simple we copy the latest reliable data (based on sample size) and apply it to people born in later decades.

Ok, that's enough methodology for now. Let's take a look at some...

Results

The first step is to validate the model by showing that it can predict the observed changes using past data. Here I mean "predict" in a peculiar sense, which is that I will use the entire dataset (including data after 1988) to build the auxiliary models, then use the simulator to generate trends from 1988 to 2010.

Here is what the results look like:

The thick lines are the observed data; the thin lines are simulations. Here are my observations:

For Jews and Catholics, the observed data falls within the bounds of the simulations, so the model validates.
For Other, the observed data sometimes exceeds the bounds of the simulations, which may be due to immigration (not included in this model).
For None, the observed data is at the high end of the range, and for Prot it is at the low end of the range. This is most likely due to the disaffiliation we saw in Part Five, which is only partly captured in this model.

I conclude that the model is capturing a large part of the observed changes since 1988, but of course I am cheating by using data from after 1988. So these results validate my modeling decisions (what to include and what to leave out) but they don't test the predictive power of the model.

Predictive power

To make an honest test, we have to restrict ourselves to data from before 1988. That way we can tell what part of the observed changes would have been predictable in 1988.

Here's what the result looks like:

So if we had used this model in 1988, we would have predicted a small decrease in the fraction of Protestants and a small increase in None, but we would have underestimated both trends.

This supports my conclusion in Part Five that something happened in the 1990s that changed trends in religious affiliation, and suggests that these changes were unpredictable based on data observable before 1988.

Predictions

Finally, we can use all data to build the models, use 2010 as the starting place for the simulations, and make some predictions for the next 30 years:

So what should we expect?

The decline in fraction of Protestants will continue. The fraction of Catholics will also decrease, but more slowly.
The fraction of Nones will increase, overtaking Catholics as the second-largest religious affiliation around 2030.
The fraction of Others will increase slowly, about 1 percentage point in 30 years. If immigration from Asia continues at current rates, that would add another percentage point, bringing the total to 6%.
The fraction of Jews will decrease, possibly by half by 2040.

These predictions are likely to be conservative; that is, the rate of secularization will almost certainly be faster. Why?

Over the last several generations, the UpbringingModel and the TransmissionModel have changed substantially. Parents are less likely to raise their children with religion, and those children are less likely to adopt the religion they are raised with. The model captures these trends, but assumes that they will level off in 2010. It would probably be more accurate to assume that they will continue.
Rates of disaffiliation among adults are also increasing. Again, the model includes trends that have already occurred, but it assumes that they will level off rather than continue.

So there are reasons to expect the fraction of Nones to accelerate.

Conversely, it is hard to imagine that the trends will be any slower than these predictions. To a large extent, these results are not predictions about things that will happen in the future; rather, they are the future consequences of things that have already happened. For example, in 2020, the GSS survey will include a cohort of people in their 40s. What will they be like? They will be a lot like the people in the 2010 survey who are in their 30s. But they will be older. Changes in the general population are slow because is takes a long time to replace each generation with the next; but as a result, they are also predictable.

Next time: Was Rick Santorum right? Is college the #1 enemy of religious belief? (Hint: no.) I will look more closely at the TransmissionModel to see what factors make vertical transmission of religion more (or less) likely.

Monday, July 9, 2012

Secularization in America: part five

Summary so far

Part Four revisited

In Part Four I looked at changes in religiosity over the lifetime of respondents. The GSS is not a longitudinal survey, so we can't follow individuals, but we can follow generations (which I partition by decade of birth) over time.

Last time I presented this figure, which shows religiosity (the fraction of respondents with any religious preference) as a function of respondent's age, partitioned by decade of birth, for people who were raised Protestant:

Each line represents a different generation. For example, the red line shows that people born in the 1920s were about 96% likely to report a religious preference when they were interviewed in their 40s, 50s, and 60s, and possibly less likely to be religious when they were in their 80s.

The conclusion I drew from this figure is that the differences between generations are larger than the changes, over time, within each generation. For purposes of modeling I concluded that religious disaffiliation accounts for only a small part of the observed changes in religious identity.

But I was bothered by one feature of these curves: many of them are concave down, and the maximum point in the curves is apparently shifting toward younger ages. I came to suspect that this picture of the data is "out of focus".

We can refocus the image by plotting the date of the survey (rather than the respondent's age) on the x-axis. Here's what that looks like:

In this figure, two trends are more apparent: before 1990, most generations were becoming more religious; after 1990, they all became less religious. So it seems clear that the explanation is something that affected all generations at a particular interval in time, not something that affects all people as they age.

We can see these changes more clearly by normalizing each curve with its 1990 value:

Again, most generation were becoming more religious before 1990; after 1990, all of them became less religious. Among people born in the 1960s, more than 10% lost their religion between 1990 and 2010 (when they were in their 30s and 40s).

Here's the same graph for people raised Catholic:

The general shape is the same: religious affiliation was flat or increasing prior to 1990, and decreasing for almost all generations after 1990.

Since the trends are similar for Catholics and Protestants, we can get a less noisy picture by combining them. Here is the same graph for respondents raised with any religion.

This figures makes it easier to compare across generations. It appears that more recent generations (born in the 1960s and 1970s) are disaffiliating at higher rates than earlier generations.

[As an aside, this result contradicts one of the primary (and widely-reported) claims of this article: Schwadel, Period and Cohort Effects on Religious Nonaffiliation and Religious Disaffiliation. Schwadel reports that people born in the 1960s and 1970s were disaffiliating at a slower rate than the previous generations. Some reasons my results might be different: Schwadel only had GSS data up to 2006, and he discards people under 30 years of age. So very little data about the youngest generations is included. Also, his results are based on statistical models that (if I understand correctly) don't include time as an explanatory variable, so they cannot account for an event that affects all generations during a particular interval.]

All right, it's audience participation time. What happened in the 1990s that caused widespread religious disaffiliation? Remember, idle speculations only. No evidence, please!