Since then, I have been planning to replicate the study using data from the European Social Survey, but I didn't get back to it until a few weeks ago. I was reminded about it because they recently released data from Round 7, conducted in 2014. I am always excited about new data, but in this case it doesn't help me: in Rounds 6 and 7 they did not ask about Internet use.
But they did ask in Rounds 1 through 5, collected between 2002 and 2010. So that's something. I downloaded the data, loaded it into Pandas dataframes, and started cleaning, validating, and doing some preliminary exploration. This IPython notebook shows what the first pass looks like.
In the interest of open, replicable science, I am posting preliminary work here, but at this point we should not take the results too seriously.
Data inventory
The dependent variables I plan to study arerlgblg: Do you consider yourself as belonging to any particular religion or denomination?
rlgdgr: Regardless of whether you belong to a particular religion, how religious would you say you are?
The explanatory variables are
yrbrn: And in what year were you born?
hinctnta: Using this card, please tell me which letter describes your household's total income, after tax and compulsory deductions, from all sources? If you don't know the exact figure, please give an estimate. Use the part of the card that you know best: weekly, monthly or annual income.
eduyrs: About how many years of education have you completed, whether full-time or part-time? Please report these in full-time equivalents and include compulsory years of schooling.
tvtot: On an average weekday, how much time, in total, do you spend watching television?
rdtot: On an average weekday, how much time, in total, do you spend listening to the radio?
nwsptot: On an average weekday, how much time, in total, do you spend reading the newspapers?
netuse: Now, using this card, how often do you use the internet, the World Wide Web or e-mail - whether at home or at work - for your personal use?
Recodes
Income: I created a variable, hinctnta5, which subtracts 5 from hinctnta, so the mean is near 0. This shift makes the parameters of the model easier to interpret.Year born: Similarly, I created yrbrn60, which subtracts 1960 from yrbrn.
Years of education: The distribution of eduyrs includes some large values that might be errors, and the question was posed differently in the first few rounds. I will investigate more carefully later, but for now I am replacing values greater than 25 years with 25, and subtracting off the mean, 12, to create eduyrs12.
Results
Just to get a quick look at things, I ran a logistic regression with rlgblg as the dependent variable, using data Rounds 1-5 and including all countries. The sample size is 229,307. Here are the estimated parameters (computed by StatsModels):coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.9811 | 0.014 | 70.014 | 0.000 | 0.954 1.009 |
yrbrn60 | -0.0078 | 0.000 | -27.826 | 0.000 | -0.008 -0.007 |
eduyrs12 | -0.0376 | 0.001 | -29.619 | 0.000 | -0.040 -0.035 |
hinctnta5 | -0.0220 | 0.002 | -12.934 | 0.000 | -0.025 -0.019 |
tvtot | -0.0161 | 0.002 | -7.205 | 0.000 | -0.021 -0.012 |
rdtot | -0.0149 | 0.002 | -8.826 | 0.000 | -0.018 -0.012 |
nwsptot | -0.0320 | 0.004 | -8.924 | 0.000 | -0.039 -0.025 |
netuse | -0.0758 | 0.002 | -42.062 | 0.000 | -0.079 -0.072 |
The parameters are all statistically significant with very small p-values. And they are all negative, which indicates:
- Younger people are less likely to report a religious affiliation.
- More educated people are less likely...
- People with higher income are less likely...
- People who consume more media (television, radio, newspaper) are less likely...
- People who use the Internet more are less likely...
The effect of Internet use (per hour per week) appears to be about twice the effect of reading the newspaper, which is about twice the effect of television or radio.
The effect of the Internet is comparable to about a decade of age, two years of education, or 3 deciles of income.
Most of these results are consistent with what I saw in my previous study and what other studies have shown. One exception is income: in other studies, the usual pattern is that people in the lowest income groups are less likely to be affiliated, and after that, income has no effect. We'll see if this preliminary result holds up.
I ran a similar model using rlgdgr (degree of religiosity) as the dependent variable:
coef | std err | t | P>|t| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 5.6668 | 0.019 | 300.071 | 0.000 | 5.630 5.704 |
yrbrn60 | -0.0180 | 0.000 | -47.352 | 0.000 | -0.019 -0.017 |
eduyrs12 | -0.0688 | 0.002 | -40.159 | 0.000 | -0.072 -0.065 |
hinctnta5 | -0.0266 | 0.002 | -11.397 | 0.000 | -0.031 -0.022 |
tvtot | -0.0801 | 0.003 | -26.334 | 0.000 | -0.086 -0.074 |
rdtot | -0.0179 | 0.002 | -7.791 | 0.000 | -0.022 -0.013 |
nwsptot | -0.0531 | 0.005 | -10.873 | 0.000 | -0.063 -0.044 |
netuse | -0.1020 | 0.002 | -40.942 | 0.000 | -0.107 -0.097 |
The results are similar. Again, this IPython notebook has the details.
Limitations
Again, we should not take these results too seriously yet:- So far I am not taking into account the weights associated with respondents, either within or across countries. So for now I am oversampling people in small countries, as well as some groups within countries.
- At this point I haven't done anything careful to fill missing values, so the results will change when I get that done.
- And I think it will be more meaningful to break the results down by country.
Stay tuned. More coming soon!
What is Intercept standing for? for neutral (rlgblg, rlgdgr)?
ReplyDeleteand what is its coef.?
Intercept is the value of the dependent variable that the model gives, when the independent variable are zero.
DeleteFor rlgblg I used logistic regression, so the intercept is in log odds. For someone born in 1960, with 12 years of education, income 5 (on a 10 point scale), with minimal media use and no access to the Internet, the value is close to 1, which corresponds to a probability near 50%.
DeleteFor rlgdgr, the same hypothetical person would be expected to report 5.7 on a 10 point scale.
But don't take that too seriously, because it's an unweighted average across different countries, and it's at the extreme end of some independent variables.
At this point I'm just warming up the models.