Monday, March 28, 2016

Engineering education is good preparation for a lot of things

I am a big fan of engineering education.  I think studying engineering for four years is intellectually challenging, exciting, and (when it is done right) fun.  An engineering education is good preparation for careers in many fields and (again, when it is done right) good preparation for citizenship.

So you can imagine my reaction to a click-bait headline like, "Does Engineering Education Breed Terrorists?", which appeared recently in the Chronicle of Higher Education.

The article presents results from a new book, Engineers of Jihad: The Curious Connection Between Violent Extremism and Education, by Diego Gambetta and Steffen Hertog.  If you don't want to read the book, it looks like you can get most of the results from this working paper, which is free to download.

I have not read the book, but I will summarize results from the paper.

Engineers are over-represented

Gambetta and Hertog compiled a list of 404 "members of violent Islamist groups".  Of those, they found educational information on 284, and found that 196 had some higher education.  For 178 of those cases, they were able to determine the subject of study, and of those, the most common fields were engineering (78), Islamic studies (34), medicine (14), and economics/business (12).

So the representation of engineers among this group is 78/178, or 44%.  Is that more than we should expect?  To answer that, the ideal reference group would be a random sample of males from the same countries, with the same distribution of ages, selecting people with some higher education.

It is not easy to assemble a reference sample like that, but Gambetta and Hertog approximate it with data from several sources.  Their Table 4 summarizes the results, broken down by country:


In every country, the fraction of engineers in the sample of violent extremists is higher than the fraction in the reference group (with the possible exceptions of Singapore and Saudi Arabia).

As always in studies like this, there are methodological issues that could be improved, but taking the results at face value, it looks like people who studied engineering are over-represented among violent extremists by a substantial factor -- Gambetta and Hertog put it at "two to four times the size we would expect [by random selection]".

Why engineers?

In the second part of the paper, Gambetta and Hertog turn to the question of causation; as they formulate it, "we will try to explain only why engineers became more radicalized than people with other degrees".

They consider four hypotheses:

1) "Random appearance of engineers as first movers, followed by diffusion through their network", which means that if the first members of these groups were engineers by chance, and if they recruited primarily through their social networks, and if engineers are over-represented in those networks, the observed over-representation might be due to historical accident.

2) "Selection of engineers because of their technical skills"; that is, engineers might be over-represented because extremist groups recruit them specifically.

3) "Engineers' peculiar cognitive traits and dispositions"; which includes both selection and self-selection; that is, extremist groups might recruit people with the traits of engineers, and engineers might be more likely to have traits that make them more likely to join extremist groups.  [I repeat "more likely" as a reminder that under this hypothesis, not all engineers have the traits, and not all engineers with the traits join extremist groups.]

4) "The special social difficulties faced by engineers in Islamic countries"; which is the hypothesis that (a) engineering is a prestigious field of study, (b) graduates expect to achieve high status, (c) in many Islamic countries, their expectations are frustrated by lack of economic opportunity, and (d) the resulting "frustrated expectations and relative deprivation" explain why engineering graduates are more likely to become radicalized.

Their conclusion is that 1 and 2 "do not survive close scrutiny" and that the observed over-representation is best explained by the interaction of 3 and 4.

My conclusions

Here is my Bayesian interpretation of these hypotheses and the arguments Gambetta and Hertog present:

1) Initially, I didn't find this hypothesis particularly plausible.  The paper presents weak evidence against it, so I find it less plausible now.

2) I find it very plausible that engineers would be recruited by extremist groups, not only for specific technical skills but also for their general ability to analyze systems, debug problems, design solutions, and work on teams to realize their designs.  And once recruited, I would expect these skills to make them successful in exactly the ways that would cause them to appear in the selected sample of known extremists.  The paper provides only addresses a narrow version of this hypothesis (specific technical skills), and provides only weak evidence against it, so I still find it highly plausible.

3) Initially I found it plausible that there are character traits that make a person more likely to pursue engineering and more likely to join and extremist group.  This hypothesis could explain the observed over-representation even if both of those connections are weak; that is, even if few of the people who have the traits pursue engineering and even fewer join extremist groups.  The paper presents some strong evidence for this hypothesis, so I find it quite plausible.

4) My initial reaction to this hypothesis is that the causal chain sounds fragile.  And even if true, it is so hard to test, it verges on being unfalsifiable.  But Gambetta and Hertog were able to do more work with it than I expected, so I grudgingly upgrade it to "possible".

In summary, my interpretation of the results differs slightly from the authors': I think (2) and (3) are more plausible than (3) and (4).

But overall I think the paper is solid: it starts with an intriguing observation, makes good use of data to refine the question and to evaluate alternative hypotheses, and does a reasonable job of interpreting the results.

So at this point, you might be wondering "What about engineering education?  Does it breed terrorists, like the headline said?"

Good question!  But Gambetta and Hertog have nothing to say about it.  They did not consider this hypothesis, and they presented no evidence in favor or against.  In fact, the phrase "engineering education" does not appear in their 90-page paper except in the title of a paper they cite.

Does bad journalism breed click bait?

So where did that catchy headline come from?  From the same place all click bait comes from: the intersection of dwindling newspaper subscriptions and eroding journalistic integrity.

In this case, the author of the article, Dan Berrett, might not be the guilty party.  He does a good job of summarizing Gambetta and Hertog's research.  He presents summaries of a related paper on the relationship between education and extremism, and a related book.  He points out some limitations of the study design.  So far, so good.

The second half of the article is weaker, where Berrett reports speculation from researchers farther and farther from the topic.

For example, Donna M. Riley at Virginia Tech suggests that traditional [engineering] programs "reinforce positivist patterns of thinking" because "general-education courses [...] are increasingly being narrowed" and "engineers spend almost all their time with the same set of epistemological rules."

Berrett also quotes a 2014 study of American students in four engineering programs (MIT, UMass Amherst, Olin, and Smith), which concludes, "Engineering education, fosters a culture of disengagement that defines public welfare concerns as tangential to what it means to practice engineering."

But even if these claims are true, they are about American engineering programs in 2010s.  They are unlikely to explain the radicalization of the extremists in the study, who were educated in Islamic countries in the "mid 70s to late 90s".

At this point there is no reason to think that engineering education contributed to the radicalization of the extremists in the sample.  Gambetta and Hertog don't consider the hypothesis, and present no evidence that supports it.  To their credit, they "wouldn’t presume to make curricular recommendations to [engineering] educators."

In summary, Gambetta and Hertog provide several credible explanations, supported by evidence, that could explain the observed over-representation.  Speculating on other causes, assuming they are true—and then providing prescriptions for American engineering education—is gratuitous.  And suggesting in a headline that engineering education causes terrorism is hack journalism.

We have plenty of problems

Traditional engineering programs are broken in a lot of ways.  That's why Olin College was founded, and why I wanted to work here.  I am passionate about engineering education and excited about the opportunities to make it better.  Furthermore, some of the places we have the most room for improvement are exactly the ones highlighted in Derrett's article.

Yes, the engineering curriculum should "emphasize human needs, contexts, and interactions", as the
executive director of the American Society for Engineering Education (ASEE) suggests.  As he explains, "We want engineers to understand what motivates humans, because that’s how you satisfy human needs.  You have to understand people as people."

And yes, we want engineers "who recognize needs, design solutions and engage in creative enterprises for the good of the world".  That's why we put it in Olin College's mission statement.

We may not be doing these things very well, yet.  We are doing our best to get better.

In the meantime, it is premature to add "preventing terrorism" to our list of goals.  We don't know the causes of terrorism, we have no reason to think engineering education is one of them, and even if we did, I'm not sure we would know what to do about it.

Debating changes in engineering education is hard enough.  Bringing terrorism into it—without evidence or reason—doesn't help.

Friday, March 18, 2016

The retreat from religion is accelerating

Updated results from the CIRP Freshman Survey were published last week, which means it's time for me to update to my annual series on the increasing number of college students with no religious affiliation (the "nones").

Since I started following this trend it 2008, I have called it "one of the most under-reported stories of the decade". I'm not sure if that's true any more; there have been more stories recently about secularization in the U.S., and the general disengagement of Millennials from organized religion.

But one part of the story continues to surprise me: the speed of the transformation is remarkable, averaging almost 1 percentage point per year for the last 20 years, and there is increasing evidence that it is accelerating.

30% have no religion

Among first-year college students in Fall 2015, the fraction reporting no religious affiliation reached an all-time high of 29.6%, up more than 2 percentage points from 27.5% last year.

And the fraction reporting that they never attended a religious service also reached an all-time high at 30.5%, up from 29.3% last year.

These reports are based on survey results from the Cooperative Institutional Research Program (CIRP) of the Higher Education Research Insitute (HERI).  In 2015, more than 141,000 full-time, first-time college students at 199 colleges and universities completed the CIRP Freshman Survey, which includes questions about students’ backgrounds, activities, and attitudes.

In one question, students select their “current religious preference,” from a choice of seventeen common religions, “Other religion,” "Atheist", "Agnostic", or “None.”  The options "Atheist" and "Agnostic" are new this year.  For consistency with previous years, I compare the "Nones" from previous years with the sum of "None", "Atheist" and "Agnostic" this year.

Another question asks students how often they “attended a religious service” in the last year. The choices are “Frequently,” “Occasionally,” and “Not at all.” Students are instructed to select “Occasionally” if they attended one or more times.

The following figure shows the fraction of Nones over more than 40 years of the survey.

The blue line shows actual data through 2014; the red line shows a quadratic fit to the data.  The dark gray region shows a 90% confidence interval, which represents uncertainty about the parameters of the model.  The light gray region shows a 90% predictive interval, which represents uncertainty about the predicted value.

The blue square shows the actual data point for 2015.  For the first time since I starting running this analysis, the new data point falls outside the predicted interval.

30% have not attended a religious service

Here is the corresponding plot for attendance at religious services.

The new data point for 2015, at 30.5%, falls in the predicted range, although somewhat ahead of the long term trend.

The trend is accelerating

The quadratic model suggests that the trend is accelerating, but for a clearer test, I plotted annual changes from 1985 to the present:
Visually, it looks like the annual changes are getting bigger, and the slope of the fitted line is about 0.4 percentage points per decade.  The p-value of that slope is 3.6%, which is considered statistically significant, but barely.  You can interpret that as you like.

The gender gap persists

As always, more males than females report no religious preference.
 The gender gap closed slightly this year, but the long term trend suggests it is still growing.

Predictions for 2016

Using the new 2015 data, we can run the same analysis to generate predictions for 2016.  Here is the revised plot for "Nones":


This year's measurement is ahead of the long-term trend, so next year's is predicted to regress to 28.6% (down 1.0%).

And here is the prediction for "No attendance":

Again, because this year's value is ahead of the long term trend, the center of the predictive distribution is lower, at 29.8% (down 0.7%).

I'll be back next year to check on these predictions.


Comments

1) The addition of the two new categories, "Atheist" and "Agnostic" is new this year.  Of the 29.6% I am treating as "no religion", 8.3% reported as agnostic, 5.9% as atheist, and 15.4% as none.  It's hard to say what effect the change in the question had on the results.

2) The number of schools and the number of students participating in the Freshman Survey has been falling for several years.  I wonder if the mix of schools represented in the survey is changing over time, and what effect this might have on the trends I am watching.  The percentage of "Nones" is different across different kinds of institutions (colleges, universities, public, private, etc.)  If participation rates are changing among these groups, that would affect the results.

3) Obviously college students are not representative of the general population.  Data from other sources indicate that the same trends are happening in the general population, but I haven't been able to make a quantitative comparison between college students and others.  Data from other sources also indicate that college graduates are slightly more likely to attend religious services, and to report a religious preference, than the general population.


Data Source
Eagan, K., Stolzenberg, E. B., Ramirez, J. J., Aragon, M. C., Suchard, M. R., & Hurtado, S. (2014). The American freshman: National norms fall 2014. Los Angeles: Higher Education Research Institute, UCLA.

This and all previous reports are available from the HERI publications page.

Friday, March 4, 2016

A quick Bayes problem in IPython

I've been busy with other projects, so I haven't posted anything new for a while.  But a reader posted a nice Bayes's theorem problem, which I am using as an excuse to experiment with embedding an IPython/Jupyter notebook in a blog, following these instructions. Let's see how it goes...

The following problem was submitted to my blog, Probably Overthinking It, by a user named Amit, who wrote:

The following data is about a poll that occurred in 3 states. In state1, 50% of voters support Party1, in state2, 60% of the voters support Party1, and in state3, 35% of the voters support Party1. Of the total population of the three states, 40% live in state1, 25% live in state2, and 35% live in state3. Given that a voter supports Party1, what is the probability that he lives in state2?

My solution follows. First I'll create a suite to represent our prior knowledge. If we know nothing about a voter, we would use the relative populations of the states to guess where they are from.

In [2]:
prior = thinkbayes2.Suite({'State 1': 0.4, 'State 2': 0.25, 'State 3': 0.35})
prior.Print()
State 1 0.4
State 2 0.25
State 3 0.35

Now if we know a voter supports Party 1, we can use that as data to update our belief. The following dictionary contains the likelihood of the data (supporting Party 1) under each hypothesis (which state the voter is from).

In [3]:
likelihood = {'State 1': 0.5, 'State 2': 0.60, 'State 3': 0.35}

To make the posterior distribution, I'll start with a copy of the prior.

The update consists of looping through the hypotheses and multiplying the prior probability of each hypothesis, hypo, by the likelihood of the data if hypo is true.

The result is a map from hypotheses to posterior likelihoods, but they are not probabilities yet because they are not normalized.

In [4]:
posterior = prior.Copy()
for hypo in posterior:
    posterior[hypo] *= likelihood[hypo]
posterior.Print()
State 1 0.2
State 2 0.15
State 3 0.1225

Normalizing the posterior distribution returns the total likelihood of the data, which is the normalizing constant.

In [5]:
posterior.Normalize()
Out[5]:
0.4725

Now the posterior is a proper distribution:

In [6]:
posterior.Print()
State 1 0.42328042328
State 2 0.31746031746
State 3 0.259259259259

And the probability that the voter is from State 2 is about 32%.

In [ ]:
 

Tuesday, December 8, 2015

Many rules of statistics are wrong

There are two kinds of people who violate the rules of statistical inference: people who don't know them and people who don't agree with them.  I'm the second kind.

The rules I hold in particular contempt are:

The interpretation of p-values: Suppose you are testing a hypothesis, H, so you've defined a null hypothesis, H0, and computed a p-value, which is the likelihood of an observed effect under H0.

According to the conventional wisdom of statistics, if the p-value is small, you are allowed to reject the null hypothesis and declare that the observed effect is "statistically significant".  But you are not allowed to say anything about H, not even that it is more likely in light of the data.

I disagree.  If we were really not allowed to say anything about H, significance testing would be completely useless, but in fact it is only mostly useless.  As I explained in this previous article, a small p-value indicates that the observed data are unlikely under the null hypothesis.  Assuming that they are more likely under H (which is almost always the case), you can conclude that the data are evidence in favor of H and against H0.  Or, equivalently, that the probability of H, after seeing the data, is higher than it was before.  And it is reasonable to conclude that the apparent effect is probably not due to random sampling, but might have explanations other than H.

Correlation does not imply causation: If this slogan is meant as a reminder that correlation does not always imply causation, that's fine.  But based on responses to some of my previous work, many people take it to mean that correlation provides no evidence in favor of causation, ever.

I disagree.  As I explained in this previous article, correlation between A and B is evidence of some causal relationship between A and B, because you are more likely to observe correlation if there is a causal relationship than if there isn't.  The problem with using correlation for to infer causation is that it does not distinguish among three possible relationships: A might cause B, B might cause A, or any number of other factors, C, might cause both A and B.

So if you want to show that A causes B, you have to supplement correlation with other arguments that distinguish among possible relationships.  Nevertheless, correlation is evidence of causation.

Regression provides no evidence of causation: This rule is similar to the previous one, but generalized to include regression analysis.  I posed this question the reddit stats forum: the consensus view among the people who responded is that regression doesn't say anything about causation, ever.  (More about that in this previous article.)

I disagree.  It think regression provides evidence in favor of causation for the same reason correlation does, but in addition, it can distinguish among different explanations for correlation.  Specifically, if you think that a third factor, C, might cause both A and B, you can try adding a variable that measures C as an independent variable.  If the apparent relationship between A and B is substantially weaker after the addition of C, or if it changes sign, that's evidence that C is a confounding variable.

Conversely, if you add control variables that measure all the plausible confounders you can think of, and the apparent relationship between A and B survives each challenge substantially unscathed, that outcome should increase your confidence that either A causes B or B causes A, and decrease your confidence that confounding factors explain the relationship.

By providing evidence against confounding factors, regression provides evidence in favor of causation, but it is not clear whether it can distinguish between "A causes B" and "B causes A".  The received wisdom of statistics says no, of course, but at this point I hope you understand why I am not inclined to accept it.

In this previous article, I explore the possibility that running regressions in both directions might help.  At this point, I think there is an argument to be made, but I am not sure.  It might turn out to be hogwash.  But along the way, I had a chance to explore another bit of conventional wisdom...

Methods for causal inference, like matching estimators, have a special ability to infer causality:  In this previous article, I explored a propensity score matching estimator, which is one of the methods some people think have special ability to provide evidence for causation.  In response to my previous work, several people suggested that I try these methods instead of regression.

Causal inference, and the counterfactual framework it is based on, is interesting stuff, and I look forward to learning more about it.  And matching estimators may well squeeze stronger evidence from the same data, compared to regression.  But so far I am not convinced that they have any special power to provide evidence for causation.

Matching estimators and regression are based on many of the same assumptions and vulnerable to some of the same objections.  I believe (tentatively for now) that if either of them can provide evidence for causation, both can.

Quoting rules is not an argument

As these examples show, many of the rules of statistics are oversimplified, misleading, or wrong.  That's why, in many of my explorations, I do things experts say you are not supposed to do.  Sometimes I'm right and the rule is wrong, and I write about it here.  Sometimes I'm wrong and the rule is right; in that case I learn something and I try to explain it here.  In the worst case, I waste time rediscovering something everyone already "knew".

If you think I am doing something wrong, I'd be interested to hear why.  Since my goal is to test whether the rules are valid, repeating them is not likely to persuade me.  But if you explain why you think the rules are right, I am happy to listen.


Thursday, December 3, 2015

Internet use and religion, part six

In my work on Internet use and religion, one of the recurring questions is whether the analysis I am doing, primarily regression models using observational data, provides evidence that Internet use causes decreased religiosity, or only shows a statistical association between them.

I discuss this question in the next section, which is probably too long.  If you get bored, you can skip to the following section, which presents a different method for estimating effect size, a "propensity score matching estimator", and compares the results to the regression models.

What does "evidence" mean?

In the previous article I presented results from two regression models and made the following argument:

  1. Internet use predicts religiosity fairly strongly: the effect size is stronger than education, income, and use of other media (but not as strong as age).
  2. Controlling for the same variables, religiosity predicts Internet use only weakly: the effect is weaker than age, date of interview, income, education, and television (and about the same as radio and newspaper).
  3. This asymmetry suggests that Internet use causes a decrease in religiosity, and the reverse effect (religiosity discouraging Internet use) is weaker or zero.
  4. It is still possible that a third factor could cause both effects, but the control variables in the model, and asymmetry of the effect, makes it hard to come up with plausible ideas for what the third factor could be.

I am inclined to consider these results as evidence of causation (and not just a statistical association).

When I make arguments like this, I get pushback from statisticians who assert that the kind of observational data I am working with cannot provide any evidence for causation, ever.  To understand this position better, I posted this query on reddit.com/r/statistics.  As always, I appreciate the thoughtful responses, even if I don't agree.  The top-rated comments came from /u/Data_Driven_Dude, who states this position:
Causality is almost wholly a factor of methodology, not statistics. Which variables are manipulated, which are measured, when they're measured/manipulated, in what order, and over what period of time are all methodological considerations. Not to mention control/confounding variables. 
So the most elaborate statistics in the world can't offer evidence of causation if, for example, a study used cross-sectional survey design. [...] 
Long story short: causality is a helluva lot harder than most people believe it to be, and that difficulty isn't circumvented by mere regression.
I believe this is a consensus opinion among many statisticians and social scientists, but to be honest I find it puzzling.  As I argued in this previous article, correlation is in fact evidence of causation, because observing a correlation is more likely if there is causation than if there isn't.

The problem with correlation is not that it is powerless to demonstrate causation, but that a simple bivariate correlation between A and B can't distinguish between A causing B, B causing A, or a confounding variable, C, causing both A and B.

But regression models can.  In theory, if you control for C, you can measure the causal effect of A on B.  In practice, you can never know whether you have identified and effectively controlled for all confounding variables.  Nevertheless, by adding control variables to a regression model, you can find evidence for causation.  For example:

  1. If A (Internet use, in my example) actually causes B (decreased religiosity), but not the other way around, and we run regressions with B as a dependent variable, and A as an explanatory variable, we expect find that A predicts B, of course.  But we also expect the observed effect to persist as we add control variables.  The magnitude of the effect might get smaller, but if we can control effectively for all confounding variables, it should converge on the true causal effect size.
  2. On the other hand, if we run the regression the other way, using B to predict A, we expect to find that B predicts A, but as we add control variables, the effect should disappear, and if we control for all confounding variables, it should converge to zero.

For example, in this previous article, I found that first babies are lighter than others, by about 3 ounces.  However, the mothers of first babies tend to be younger, and babies of younger mothers tend to be lighter.  When I control for mother's age, the apparent effect is smaller, less than an ounce, and no longer statistically significant.  I conclude that mother's age explains the apparent difference between first babies and others, and that the causal effect of being a first baby is small or zero.

I don't think that conclusion is controversial, but here's the catch: if you accept "the effect disappears when I add control variables" as evidence against causation, then you should also accept "the effect persists despite effective control for confounding variables" as evidence for causation.

Of course the key word is "effective".  If you think I have not actually controlled for an important confounding variable, you would be right to be skeptical of causation.  If you think the controls are weak, you might accept the results as weak evidence of causation.  But if you think the controls are effective, you should accept the results as strong evidence.

So I don't understand the claim that regression models cannot provide any evidence for causation, at all, ever, which I believe is the position my correspondents took, and which seems to be taken as a truism among at least some statisticians and social scientists.

I put this question to /u/Data_Driven_Dude, who wrote the following:
[...]just because you have statistical evidence for a hypothesized relationship doesn't mean you have methodological evidence for it. Methods and stats go hand in hand; the data you gather and analyze via statistics are reflections of the methods used to gather those data.
Say there is compelling evidence that A causes B. So I hypothesize that A causes B. I then conduct multiple, methodologically-rigorous studies over several years (probably over at least a decade). This effectively becomes my research program, hanging my hat on the idea that A causes B. I become an expert in A and B. After all that work, the studies support my hypothesis, and I then suggest that there is overwhelming evidence that A causes B.
 
Now take your point. There could be nascent (yet compelling) research from other scientists, as well logical "common sense," suggesting that A causes B. So I hypothesize that A causes B. I then administer a cross-sectional survey that includes A, B, and other variables that may play a role in that relationship. The data come back, and huzzah! Significant regression model after controlling for potentially spurious/confounding variables! Conclusion: A causes B. 
Nope. In your particular study, ruling out alternative hypotheses by controlling for other variables and finding that B doesn't predict A when those other variables are considered is not evidence of causality. Instead, what you found is support for the possibility that you're on the right track to finding causality. You found that A tentatively predicts B when controlling for XYZ, within L population based on M sample across N time. Just because you started with a causal hypothesis doesn't mean you conducted a study that can yield data to support that hypothesis. 
So when I say that non-experimental studies provide no evidence of causality, I mean that not enough methodological rigor has been used to suppose that your results are anything but a starting point. You're tiptoeing around causality, you're picking up traces of it, you see its shadow in the distance. But you're not seeing causality itself, you're seeing its influence on a relationship: a ripple far removed from the source.
I won't try to summarize or address this point-by-point, but a few observations:

  1. One point of disagreement seems to be the meaning of "evidence".  I admit that I am using it in a Bayesian sense, but honestly, that's because I don't understand any alternatives.  In particular, I don't understand the distinction between "statistical evidence" and "methodological evidence".
  2. Part of what my correspondent describes is a process of accumulating evidence, starting with initial findings that might not be compelling and ending (a decade later!) when the evidence is overwhelming.  I mostly agree with this, but I think the process starts when the first study provides some evidence and continues as each additional study provides more.  If the initial study provides no evidence at all, I don't know how this process gets off the ground.  But maybe I am still stuck on the meaning of "evidence".
  3. A key feature of this position is the need for methodological rigor, which sounds good, but I am not sure what it means.  Apparently regression models with observational data lack it.  I suspect that randomized controlled trials have it.  But I'm not sure what's in the middle.  Or, to be more honest, I know what is considered to be in the middle, but I'm not sure I agree.
To pursue the third point, I am exploring methods commonly used in the social sciences to test causality.  


SKIP TO HERE!

Matching estimators of causal effects

I'm reading Morgan and Winship's Counterfactuals and Causal Inference, which is generally good, although it has the academic virtue of presenting simple ideas in complicated ways.  So far I have implemented one of the methods in Counterfactuals, a propensity score matching estimator.  Matching estimators work like this:

  1. To estimate the effect of a particular treatment, D, on a particular outcome, Y, we divide an observed sample into a treatment group that received D and a control group that didn't.
  2. For each member of the treatment group, we identify a member of the control group that is as similar as possible (I'll explain how soon), and compute the difference in Y between the matched pair.
  3. Averaging the observed differences over the pairs yields an estimate of the mean causal effect of D on Y.

The hard part of this process is matching.  Ideally the matching process should take into account all factors that cause Y.  If the pairs are identical in all of these factors, and differ only in D, any average difference in Y must be caused by D.

Of course, in practice we can't identify all factors that cause Y.  And even if we could, we might not be able to observe them all.  And even if we could, we might not be able to find a perfect match for each member of the treatment group.

We can't solve the first two problems, but "propensity scores" help with the third.  The basic idea is

  1. Identify factors that predict D.  In my example, D is Internet use.
  2. Build a model that uses those factors to predict the probability of D for each member of the sample; this probability is the propensity score.  In my example, I use logistic regression with age, income, education, and other factors to compute propensity scores.  
  3. Match each member of the treatment group with the member in the control group with the closest propensity score.  In my example, the members of each pair have the same predicted probability of using the Internet (according to the model in step 2), so the only relevant difference between them is that one did and one didn't.  Any difference in the outcome, religiosity, should reflect the causal effect of the treatment, Internet use.
I implemented this method (details below) and applied it to the data from the European Social Survey (ESS).

In each country I divide respondents into a treatment group with Internet use above the median and a control group below the median.  We lose information by quantizing Internet use in this way, but the distribution tends to be bimodal, with many people at the extremes and few in the middle, so treating Internet use as a binary variable is not completely terrible.

To compute propensity scores, I use logistic regression to predict Internet use based on year of birth (including a quadratic term), year of interview, education and income (as in-country ranks), and use of other media (television, radio, and newspaper).

As expected, the average propensity in the treatment group is higher than in the control group.  But some members of the control group are matched more often than others (and some not at all).  After matching, the two groups have the same average propensity.


Finally, I compute the pair-wise difference in religiosity and the average across pairs.

For each country, I repeat this process 101 times using weighted resampled data with randomly filled missing values.  That way I can compute confidence intervals that reflect variation due to sampling and missing data.  The following figure shows the estimated effect size for each country and a 95% confidence interval:
The estimated effect of Internet use on religiosity is negative in 30 out of 34 countries; in 18 of them it is statistically significant.  In 4 countries the estimate is positive, but none of them are statistically significant.

The median effect size is 0.28 points on a 10 point scale.  The distribution of effect size across countries is similar to the results from the regression model, which has median 0.33:

The confidence intervals for the matching estimator are bigger.  Some part of this difference is because of the information we lose by quantizing Internet use.  Some part is because we lose some samples during the matching process.  And some part is due to the non-parametric nature of matching estimators, which make fewer assumptions about the structure of the effect.

How it works

The details of the method are in this IPython notebook, but I'll present the kernel of the algorithm here.  In the following, group is a Pandas DataFrame for one country, with one row for each respondent.

The first step is to quantize Internet use and define a binary variable, treatment:

    netuse = group.netuse_f
    thresh = netuse.median()
    if thresh < 1:
        thresh = 1
    group.treatment = (netuse >= thresh).astype(int)

The next step is use logistic regression to compute a propensity for each respondent:


    formula = ('treatment ~ inwyr07_f + '
               'yrbrn60_f + yrbrn60_f2 + '
               'edurank_f + hincrank_f +'
               'tvtot_f + rdtot_f + nwsptot_f')

    model = smf.logit(formula, data=group)    
    results = model.fit(disp=False)
    group.propensity = results.predict(group)
    
Next we divide into treatment and control groups:

    treatment = group[group.treatment == 1]
    control = group[group.treatment == 0]
    
And sort the controls by propensity:

    series = control.propensity.sort_values()

Then do the matching by bisection search:

    indices = series.searchsorted(treatment.propensity)
    indices[indices < 0] = 0
    indices[indices >= len(control)] = len(control)-1
    
And select the matches from the controls:

    control_indices = series.index[indices]
    matches = control.loc[control_indices]

Extract the distance in propensity between each pair, and the difference in religiosity:

    distances = (treatment.propensity.values - 
                 matches.propensity.values)
    differences = (treatment.rlgdgr_f.values - 
                   matches.rlgdgr_f.values)
    
Select only the pairs that are a close match

    caliper = differences[abs(distances) < 0.001]

And compute the mean difference:

    delta = np.mean(caliper)

That's all there is to it.  There are better ways to do the matching, but I started with something simple and computationally efficient (it's n log n, where n is the size of the control or treatment group, whichever is larger).

Back to the philosophy

The agreement of the two methods provides some evidence of causation, because if the effect were spurious, I would expect different methodologies, which are more or less robust against the spurious effect, to yield different results.

But it is not very strong evidence, because the two methods are based on many of the same assumptions.  In particular, the matching estimator is only as good as the propensity model, and in this case the propensity model includes the same factors as the regression model.  If those factors effectively control for confounding variables, both methods should work.  If they don't, neither will.

The propensity model uses logistic regression, so it is based on the usual assumptions about linearity and the distribution of errors.  But the matching estimator is non-parametric, so it depends on fewer assumptions about the effect itself.  It seems to me that being non-parametric is a potential advantage of matching estimators, but it doesn't help with the fundamental problem, which is that we don't know if we have effectively controlled for all confounding variables.

So I am left wondering why a matching estimator should be considered suitable for causal inference if a regression model is not.  In practice one of them might do a better job than the other, but I don't see any difference, in principle, in their ability to provide evidence for causation: either both can, or neither.

Monday, November 30, 2015

Internet use and religion, part five

[If you are jumping into the middle of this series, you might want to start with this article, which explains the methodological approach I am taking.]

In the previous article, I show results from two regression models that predict religious affiliation and degree of religiosity.  I use the models to compare hypothetical respondents who are at their national means for all explanatory factors; then I vary one factor at a time, comparing someone at the 25th percentile with someone at the 75th percentile.  I compute the different in the predicted probability of religious affiliation and the predicted level of religiosity:

1) In almost every country, a hypothetical respondent with high Internet use is less likely to report a religious affiliation.  The median effect size across countries is 3.5 percentage points.

2) In almost every country, a hypothetical respondent with high Internet use reports a lower degree of religiosity.  The median effect size is 0.36 points on a 10-point scale.

These results suggest that Internet use might cause religious disaffiliation and decreased religiosity, but they are ambiguous.  It is also possible that the direction of causation is the other way; that is, that religiosity causes a decrease in Internet use.  Or there might be other factors that cause both Internet use and religiosity.

I'll address the first possibility first.  If religiosity causes lower Internet use, we should be able to measure that effect by flipping the models, taking religious affiliation and religiosity as explanatory variables and trying to predict Internet use.

I did that experiment and found:

1) Religious affiliation (hasrelig) has no predictive value for Internet use in most countries, and only weak predictive value in others.

2) Degree of religiosity (rlgdgr) has some predictive value for Internet use in some countries, but the effect is weaker than other explanatory variables (like age and education), and weaker than the effect of Internet use on religiosity: the median across countries is 0.19 points on a 7-point scale.

Considering the two possibilities, that Internet use causes religious disaffiliation or the other way around, these results support the first possibility, although the second might make a smaller contribution.

Although it is still possible that a third factor causes increased Internet use and decreased religious affiliation, it would have to do so in a strangely asymmetric way to account for these results.  And since the model controls for age, income, education, and other media use, this hypothetical third factor would have to be uncorrelated with these controls (or only weakly correlated).

I can't think of any plausible candidates for this third factor.  So I tentatively conclude that Internet use causes decreased religiosity.  I present more detailed results below.

Summary of previous results

In the previous article, I computed an effect size for each factor and reported results for two models, one that predicts hasrelig (whether the respondent reports religious affiliations) and one that predicts rlgdgr (degree of religiosity on a 0-10 scale).  The following two graphs summarize the results.

Each figure shows the distribution of effect size across the 34 countries in the study.  The first figure shows the results for the first model as a difference in percentage points; each line shows the effect size for a different explanatory variable.

The factors with the largest effect sizes are year of birth (dark green line) and Internet use (purple).

For Internet use, a respondent who is average in every way, but falls at the 75th percentile of Internet use, is typically 2-7 percentage points less likely to be affiliated than a similar respondent at the 25th percentile of Internet use.  In a few countries, the effect is apparently the other way around, but in those cases the estimated effect size is not statistically significant.

 Overall, people who use the Internet more are less likely to be affiliated, and the effect is stronger than the effect of education, income, or the consumption of other media.

Similarly, when we try to predict degree of religiosity, people who use the Internet more (again comparing the 75th and 25th percentiles) report lower religiosity, typically 0.2 to 0.7 points on a 10 point scale.  Again, the effect size for Internet use is bigger than for education, income, or other media.
Of course, what I am calling an "effect size" may not be an effect in the sense of cause and effect.  What I have shown so far is that Internet users tend to be less religious, even when we control for other factors.  It is possible, and I think plausible, that Internet use actually causes this effect, but there are two other possible explanations for the observed statistical association:

1) Religious affiliation and religiosity might cause decreased Internet use.
2) Some other set of factors might cause both increased Internet use and decreased religiosity.

Addressing the first alternative explanation, if people who are more religious tend to use the Internet less (other things being equal), we would expect that effect to appear in a model that includes religiosity as an explanatory variable and Internet use as a dependent variable.

But it turns out that if we run these models, we find that religiosity has little power to predict levels of Internet use when we control for other factors.  I present the results below; the details are in this IPython notebook.

Model 1

The first model tries to predict level of Internet use taking religious affiliation (hasrelig) as an explanatory variable, along with the same controls I used before: year of birth (linear and quadratic terms), year of interview, education, income, and consumption of other media.

The following figure shows the effect size of religious affiliation on Internet use.
In most countries it is essentially zero, but in a few countries people who report a religious affiliation also report less Internet use, but always less than 0.5 points on a 7 point scale.

The following figure shows the distribution of effect size for the other variables on the same scale.
If we are trying to predict Internet use for a given respondent, the most useful explanatory variables, in descending order of effect size, are year of birth, education, year of interview, income, and television viewing.  The effect sizes for religious affiliation, radio listening, and newspaper reading are substantially smaller.

The results of the second model are similar.

Model 2

The second model tries to predict level of Internet use taking degree of religiosity (rlgdgr) as an explanatory variable, along with the same controls I used before.

The following figure shows the estimated effect size in each country, showing the difference in Internet use of two hypothetical respondents who are at their national mean for all variables except degree of religiosity, where they are at the 25th and 75th percentiles.

In most countries, the respondent reporting a higher level of religiosity also reports a lower level of Internet use, in almost all cases less than 0.5 points on a 7-point scale.  Again, this effect is smaller than the apparent effect of the other explanatory variables.
Again, the variables that best predict Internet use are year of birth, education, year of interview, income, and television viewing.  The apparent effect of religiosity is somewhat less than television viewing, and more than radio listening and newspaper reading.

Next steps

As I present these results, I realize that I can make them easier to interpret by expressing the effect size in standard deviations, rather than raw differences.  Internet use is recorded on a 7 point scale, and religiosity on a 10 point scale, so its not obvious how to compare them.

Also, variability of Internet use and religiosity is different across countries, so standardizing will help with comparisons between countries, too.

More results in the next installment.


Tuesday, November 24, 2015

Internet use and religion, part four

[If you are jumping into the middle of this series, you might want to start with this article, which explains the methodological approach I am taking.]

In the previous article, I presented preliminary results from a study of relationships between Internet use and religion.  Using data from the European Social Survey, I ran regressions to estimate the effect of media consumption (television, radio, newspapers, and Internet) on religious affiliation and degree of religiosity.

As control variables, I include year born, education and income (expressed as relative ranks within each country) and the year the data were gathered (between 2002 and 2010).

Some of the findings so far:

  1. In almost every country, younger people are less religious.
  2. In most countries, people with more education are less religious.
  3. In about half of the 34 countries, people with lower income are less religious.  In the rest, the effect (if any) is too small to be distinguished from noise with this sample size.
  4. In most countries, people who watch more television are less religious.
  5. In a fewer than half of the countries, people who listen to more radio are less religious.
  6. The results for newspapers are similar: only a few countries show a negative effect, and in some countries the effect is positive.
  7. In almost every country, people who use the Internet are less religious.
  8. There is a weak relationship between the strength of the effect and the average degree of religiosity: the negative effect of Internet use on religion tends to be stronger in more religious countries.
In the previous article, I measured effect size using the parameters of the regression models: for logistic regression, the parameter is a log odds ratio, for linear regression it is a linear weight.  These parameters are not useful for comparing the effects of different factors, because they are not on the same scale, and they are not the best choice for comparing effect size between countries, because they don't take into account variation in each factor in each country.

For example, in one country the parameter associated with Internet use might be small, but if there is large variation in Internet use within the country, the net effect size might be greater than in another country with a larger parameter, but little variation.

Effect size

So my next step is to define effect size in terms that are comparable between factors and between countries.  To explain the methodology, I'll use the logistic model, which predicts the probability of religious affiliation.  I start by fitting the model to the data, then use the model to predict the probability of affiliation for a hypothetical respondent whose values for all factors are the national mean.  Then I vary one factor at a time, generating predictions for hypothetical respondents whose value for one factor is at the 25th percentile (within country) and at the 75th percentile.  Finally, I compute the difference in predicted values in percentage points.

As an example, suppose a hypothetically average respondent has a 45% chance of reporting a religious affiliation, as predicted by the model.  And suppose the 25th and 75th percentiles of Internet use are 2 and 7, on a 7 point scale.  A person who is average in every way, but with Internet use only 2 might have a 47% chance of affiliation.  The same person with Internet use 7 might have a 42% chance.  In that case I would report that the effect size is a difference of 5 percentage points.

As in the previous article, I run this analysis on about 200 iterations of resampled data, then compute a median and 95% confidence interval for each value.


Quadratic age model

Before I get to the results, there is one other change from the previous installment:  I added a quadratic term for year born.  The reason is that in preliminary results, I noticed that Internet had the strongest negative association with religiosity, followed by television, then radio and newspapers.  I wondered whether this pattern might be the result of correlation with age; that is, whether younger people are more likely to consume new media and be less religious.  I was already controlling for age using yrborn60 (year born minus 1960) but I worried that if the relationship with age is nonlinear, I might not be controlling for it effectively.

So I added a quadratic term to the model.  Here are the estimated parameters for the linear term and quadratic term:

In many countries, both parameters are statistically significant, so I am inclined to keep them in the model.  The sign of the quadratic term is usually positive, so the curves are convex up, which suggests that the age effect might be slowing down.

Anyway, including the quadratic term has almost no effect on the other results: the relative strengths of the associations are the same.

Model 1 results

Again, the first model uses logistic regression with dependent variable hasrelig, which indicates whether the respondent reports a religious affiliation.

In the following figures, the x-axis is the percentage point difference in hasrelig between people at the 25th and 75th percentile for each explanatory variable.
 In most countries, people with more education are less religious.
 In most countries, the effect of income is small and not statistically significant.
 The effect of television is negative in most countries.
The effect of radio is usually small.
 The effect of newspapers is usually small.
In most countries, Internet use is associated with substantial decreases in religious affiliation.

Pulling together the results so far, the following figure shows the distribution (CDF) of effect size across countries:
 Overall, the effect size for Internet use is the largest, followed by education and television.  The effect sizes for income, radio, and newspaper are all small, and centered around zero.

Model 2 results

The second model uses linear regression with dependent variable rlgdgr, which indicates degree of religiosity on a 0-10 scale.


In the following figures, the x-axis shows the difference in rlgdgr between people at the 25th and 75th percentile for each explanatory variable.

In most countries, people with more education are less religious.
The effect size for income is smaller.
 People who watch more television are less religious.
The effect size for radio is small.
The effect size for newspapers is small.
 In most countries, people who use the Internet more are less religious.
Comparing the effect size for different explanatory variable, again, Internet use has the biggest effect, followed by education and television.  Effect sizes for income, radio, and newspaper are smaller and centered around zero.

That's all for now.  I have a few things to check out, and then I should probably wrap things up.