## Thursday, May 3, 2018

### Some people hate custom libraries

For most of my books, I provide a Python module that defines the functions and objects I use in the book.  That makes some people angry.

The following Amazon review does a nice job of summarizing the objections, and it demonstrates the surprising passion this issue evokes:

March 29, 2018
Format: Paperback
Echoing another reviewer, the custom code requirement means you learn their custom code rather than, you know, the standard modules numpy and scipy. For example, at least four separate classes are required, representing hundreds of lines of code, are required just to execute the first six lines of code in the book. All those lines do is define two signals, a cosine and a sine, sums them, then plots them. This, infuriatingly, hides some basic steps. Here's how you can create a cosine wave with frequency 440Hz:

duration = 0.5
framerate = 11025
n = round(duration*framerate)
ts = np.arange(n)/framerate
amp = 1.0
freq = 440
offset = 0.0
cos_sig = amp * numpy.cos( 2*numpy.pi*ts*freq + offset)
freq = 880
sin_sig = amp * numpy.sin( 2*numpy.pi*ts*freq + offset)

cos_sig = thinkdsp.CosSignal(freq=440,amp=1.0,offset=0)
sin_sig = thinkdsp.SinSignal(freq=440,amp=1.0,offset=0)
mix = cos_sig + sin_sig

where CosSignal and SinSignal are custom classes, not functions, which inherits four separate classes, NONE of which are necessary, and all of which serve to make things more complex than necessary, on the pretense this makes things easier. The classes these class inherit are a generic Sinusoid and SumSignal classes, which inherits a Signal class, which depends on a Wave class, which performs plotting using pyplot in matplotlib. None of which make anything really any easier, but does serve to hide a lot of basic functionality, like hiding how to use numpy, matplotlib, and pyplot.

In short, just to get through the first two pages, you have to have access to github to import their ridiculous thinkdsp, thinkplot, and thinkstats, totalling around 5500 lines of code, or you are just screwed and can't use this book. All decent teaching books develops code you need as necessary and do NOT require half a dozen files with thousands of lines of custom code just to get to page 2. What kind of clown does this when trying to write a book to show how to do basic signal processing? Someone not interested in teaching you DSP, but trying to show off their subpar programming skills by adding unnecessary complexity (a sure sign of a basic programmer, not a good).

The authors openly admit their custom code is nothing more than wrappers in numpy and scipy, so the authors KNEW they were writing a crappy book and filling it with a LOT of unnecessary complexity. Bad code is bad code. Using bad code to teach makes bad teaching. It's obvious Allen B. Downey has spent his career in academia, where writing quality code doesn't matter.

Well, at least he spelled my name right.

Maybe I should explain why I think it's a good idea to provide a custom library along with a book like Think DSP.  Importantly, the goal of the book is to help people learn the core ideas of signal processing; the software is a means to this end.

Here's what I said in the preface:
The premise of this book is that if you know how to program, you can use that skill to learn other things, and have fun doing it.
With a programming-based approach, I can present the most important ideas right away. By the end of the first chapter, you can analyze sound recordings and other signals, and generate new sounds. Each chapter introduces a new technique and an application you can apply to real signals. At each step you learn how to use a technique first, and then how it works.
For example, in the first chapter, I introduce two objects defined in thinkdsp.py: Wave and Spectrum.  Wave provides a method called make_spectrum that creates a Spectrum object, and Spectrum provides make_wave, which creates a Wave.

When readers use these objects and methods, they are implicitly learning one of the fundamental ideas of signal processing: that a Wave and its Spectrum are equivalent representations of the same information -- given one, you can always compute the other.

This example demonstrates one reason I use custom libraries in my books: The API is the lesson.  As you learn about these objects and how they interact, you are also learning the core ideas of the topic.

Another reason I think these libraries are a good idea is that they let me introduce ideas top-down: that is, I can show what a method does -- and why it is useful -- first; then I can present details when they necessary or most useful.

For example, I introduce the Spectrum object in Chapter 1.  I use it to apply a low pass filter, and the reader can hear what that sounds like.  You can too, by running the Chapter 1 notebook on Binder.

In Chapter 2, I reveal that my make_spectrum function is a thin wrapper on two NumPy functions, and present the source code:

```from np.fft import rfft, rfftfreq

# class Wave:
def make_spectrum(self):
n = len(self.ys)
d = 1 / self.framerate

hs = rfft(self.ys)
fs = rfftfreq(n, d)

return Spectrum(hs, fs, self.framerate)```

At this point, anyone who prefers to use NumPy directly, rather than my wrappers, knows how.

In Chapter 7, I unwrap one more layer and show how the FFT algorithm works.  Why Chapter 7?  Because I introduce correlation in Chapter 5, which helps me explain the Discrete Cosine Transform in Chapter 6, which helps me explain the Discrete Fourier Transform.

Using custom libraries lets me organize the material in the way I think works best, based on my experience working with students and seeing how they learn.

This example demonstrates another benefit of defining my own objects: data encapsulation.  When you use NumPy's rfft to compute a spectrum, you get an array of amplitudes, but not the frequencies they correspond to.  You can call rfftfreq to get the frequencies, and that's fine, but now you have two arrays that represent one spectrum.  Wouldn't it be nice to wrap them up in an object?  That's what a Spectrum object is.

Finally, I think these examples demonstrate good software engineering practice, particularly bottom-up design.  When you work with libraries like NumPy, it is common and generally considered a good idea to define functions and objects that encapsulate data, hide details, eliminate repeated code, and create new abstractions.  Paul Graham wrote about this idea in one of his essays on software:
[...] you don't just write your program down toward the language, you also build the language up toward your program. [...] the boundary between language and program is drawn and redrawn, until eventually it comes to rest along [...] the natural frontiers of your problem. In the end your program will look as if the language had been designed for it.
That's why, in the example that makes my correspondent so angry, it takes just three lines to create and add the signals; and more importantly, those lines contain exactly the information relevant to the operations and no more.  I think that's good quality code.

In summary, I provide custom libraries for my books because:

1) They demonstrate good software engineering practice, including bottom-up design and data encapsulation.

2) They let me present ideas top-down, showing how they are used before how they are implemented.

3) And as readers learn the APIs I defined, they are implicitly learning the key ideas.

I understand that not everyone agrees with this design decision, and maybe it doesn't work for everyone.  But I am still surprised that it makes people so angry.

## Wednesday, April 18, 2018

### Computing at Olin Q&A

I was recently interviewed by Sally Phelps, the Director of Postgraduate Planning at Olin.  We talked about computer science at Olin, which is something we are often asked to explain to prospective students and their parents, employers, and other external audiences.

Afterward, I wrote the following approximation of our conversation, which I have edited to be much more coherent than what I actually said.

I should note: My answers to the following questions are my opinions.  I believe that other Olin professors who teach software classes would say similar things, but I am sure we would not all say the same things.

 Photo Credit: Sarah Deng

Q: What is the philosophy of Olin when it comes to training software engineers of the future?

To understand computer science at Olin, you have to understand that Olin really has one curriculum, and it's engineering.

We have degrees in Engineering, Mechanical Engineering, and Electrical and Computer Engineering.  But everyone sees the same approach to engineering: it starts with people and it ends with people.  That means you can't wait for someone to hand you a well-formulated problem from a textbook; you have to understand the people you are designing for, and the context of the problem.  You have to know when an engineering solution can help and when it might not.  And then when you have a solution, you have to be able to get it out of the lab and into the world.

Q: That sounds very different from a traditional computer science degree.

It is.  Because we already have a lot of computer scientists who know how data structures work; we don't have as many who can identify opportunities, work on open-ended problems, work on teams with people from other disciplines, work on solutions that might involve electrical and mechanical systems as well as software.

And we don't have a lot of computer scientists who can communicate clearly about their work; to have impact, they have to be able to explain the value of what they are doing.  Most computer science programs don't teach those things very well.

Also most CS programs don't do a great job of preparing students to work as software engineers.  A lot of classes are too theoretical, too mathematical, and too focused on the computer itself, not the things you want to do with it, the applications.

At Olin, we've got some theory, some mathematical foundations, some focus on the design of software systems.  But we've turned that dial down because the truth is that a lot of that material is not relevant to practice.  I always get a fight when I say that, because you can never take anything out of the curriculum.  There's always someone who says you have to know how to balance a red-black tree or you can't be a computer scientist; or you have to know about Belady's anomaly, or you have to know X, Y, and Z.

Well, you don't.  For the vast majority of our students, for all the things they are going to do, a big chunk of the traditional curriculum is irrelevant.  So we look at the traditional curriculum with some skepticism, and we make cuts.

We have to; there's only so much time.  In four years, students take about 32 classes.  We have to spend them wisely.  We have to think about where they are going after graduation.  Some will go to grad school, some will start companies, some will work in industry,  Some of them will be software engineers, some will be product managers, some will work in other fields; they might develop software, or work with software developers.

Q: So how do you prepare people for all of that?

It depends what "prepare" means.  If it means teach them everything they need to know, it's impossible.  But you can identify the knowledge, skills and attitudes they are most likely to need.

It helps if you have faculty with industry experience.  A lot of professors go straight to grad school and straight into academics, and then they have long arguments about what software engineers need to know.  Sometimes they don't know what they are talking about.

If you're designing a curriculum, just like a good engineer, you have to understand the people you're designing for and the context of the solution.  Who are your students, where are they going, and what are they going to need?  Then you can decide what to teach.

Q: So if a student is interested in CS and they're deciding between Olin and another school, what do you tell them?

I usually tell them about the Olin curriculum, what I just explained.  And I suggest they look at our graduation requirements.  Students at Olin who do the Engineering major with a concentration in computing, they take a relatively small number of computer science classes, usually around seven.  And they take a lot of other engineering classes.

In the first semester, everyone takes the same three engineering classes, so everyone does some mechanical design, some circuits and measurement, and some computational modeling.

Everyone takes a foundation class in humanities, and another in entrepreneurship.  Everyone takes Principles of Engineering, where they design and build a mechatronic system.

In the fourth semester everyone takes user-centric design, and finally, in the senior year, everyone does a two-semester engineering capstone, which is usually interdisciplinary.

If a prospective student looks at those classes and they're excited about doing design and engineering -- and several kinds of engineering -- along with computer science, then Olin is probably a good choice for them.

If they look at the requirements and they dread them -- if the requirements are preventing them from doing what they really want -- then maybe Olin's not the right place.

Q: I understand there are student-taught software classes – can you tell us more about that?

We do, and a lot of them have been related to software, because that's an area where we have students doing internships, and sometimes starting companies, and they get a lot of industry experience.

And they come back with skills and knowledge they can share with their peers.  Sometimes that happens in classes, especially on projects.  But it can also be a student-led class where student instructors propose a class, and they they work with faculty advisors to develop and present the material.  As an advisor, I can help with curriculum design and the pedagogy, and sometimes I have a good view of the context or the big picture.  And then a lot of times the students have a better view of the details.  They've spent the summer working in a particular domain, or with a particular technology, and they can help their peers get a jump start.

They also bring some of the skills and attitudes of software engineering.  For example, we teach students about testing, and version control, and code quality.  But in a class it can be artificial; a lot of times students want to get code working and they have to move on to the next thing.  They don't want to hear from me about coding "style".

It can be more effective when it's coming from peers, and when it's based on industry experience.  The student instructors might say they worked at Pivotal, and they had to do pair programming, or they worked at Google, and all of their code was reviewed for readability before they could check it in.  Sometimes that's got more credibility than I do.

Q: What does the future look like for computing at Olin?

A big part of it is programming in context.  For example, the first software class is Modeling and Simulation, which is about computational models in science, including physics, chemistry, medicine, ecology…  So right from the beginning, we're not just learning to program, we're applying it to real world problems.

Programming isn't just a way of translation well understood solutions into code.  It's a way of communicating, teaching, learning, and thinking.  Students with basic programming skills can use coding as a "pedagogic lever" to learn other topics in engineering, math, natural and social science, arts and humanities.

I think we are only starting to figure out what that looks like.  We have some classes that use computation in these ways, but I think there are a lot more opportunities.  A lot of ideas that we teach mathematically, we could be doing computationally, maybe in addition to, or maybe instead of the math.

One of my examples is signal processing, where probably the most important idea is the Fourier transform.  If you do that mathematically, you have to start with complex numbers and work your way up.  It takes a long time before you get to anything interesting.

With a computational approach, I can give you a program on the first day to compute the Fourier transform, and you can use it, and apply it to real problems, and see what it does, and run experiments and listen to the results, all on day one.  And now that you know what it's good for, maybe you'll want to know how it works.  So we can go top-down, start with applications, and then open the hood and look at the engine.

I'd like to see us apply this approach throughout the curriculum, especially engineering, math, and science, but also arts, humanities and social science.

## Thursday, March 22, 2018

### Generational changes in support for gun laws

This is the fourth article in a series about changes in support for gun control laws over the last 50 years.

In the first article I looked at data from the General Social Survey and found that young adults are less likely than previous generations to support gun control.

In the second article I looked at data from the CIRP Freshman Survey and found that even the youngest adults, who grew up with lockdown drills and graphic news coverage of school shootings, are LESS likely to support strict gun control laws.

In the third article, I ran graphical tests to distinguish age, period, and cohort effects.  I found strong evidence for a period effect: support for gun control among all groups increased during the 1980s and 90s, and has been falling in all groups since 2000.  I also saw some evidence of a cohort effect: people born in the 1980s and 90s are less likely to support strict gun control laws.

In this article, I dive deeper, using logistic regression to estimate the sizes of these effects separately, while controlling for demographic factors like sex, race, urban or rural residence, etc.

### Variables

As in the previous articles, I am using data from the General Social Survey (GSS), and in particular the variable 'gunlaw', which records responses to the question:
Would you favor or oppose a law which would require a person to obtain a police permit before he or she could buy a gun?
I characterize respondents who answer "favor" to be more likely to support strict gun control laws.

The explanatory factors I consider are:

'nineties', 'eighties', 'seventies', 'fifties', 'forties', 'thirties', 'twenties':  These variables encode the respondents decade of birth.

'female': indicates that the respondent is female.

'black': indicates that the respondent is black.

'otherrace': indicates that the respondent is neither white nor black (most people in this category are mixed race).

'hispanic': indicates that the respondent is Hispanic.

'conservative', 'liberal': indicates that the respondent reports being conservative or liberal (not moderate).

'lowrealinc', 'highrealinc': indicates that the respondent's household income is in the bottom or top 25%, based on self-report, converted to constant dollars.

'college': indicates whether the respondent attended any college.

'urban', 'rural': indicates whether the respondent lives in an urban or rural area (not suburban).

'gunhome': indicates whether the respondent reports that they "
have in [their] home or garage any guns or revolvers".

'threatened': indicates whether the respondent reports that they have "ever been threatened with a gun, or shot at".

These factors are all binary.  In addition, I also estimate the period effect by including the following  variables: 'yminus10', 'yminus20', 'yminus30', and 'yminus40', to indicate respondents surveyed 10, 20, 30, and 40 years prior to 2016.

### Results

I used logistic regression to estimate the effect of each of these variables.  The regression also includes a cubic model of time, intended to capture the period effect.  You can see the period effect in the following figure, which shows actual changes in support for a gun permit law over the history of the GSS (in gray) and the retroactive predictions of the model (in red).

To present the results in an interpretable form, I define a collection of hypothetical people with different attributes and estimate the probability that each one would favor a gun permit law.

As a baseline, I start with a white, non-Hispanic male born in the 1960s who is politically moderate, in the middle 50% of the income range, who attended college, lives in a suburb, has never been threatened with a gun or shot at, and does not have a gun at home.  People like that interviewed in 2016 have a 74% chance of favoring "a law which would require a person to obtain a police permit before he or she could buy a gun".

The following table shows results for people with different attributes: the first row, which is labeled 'baseline' is the baseline person from the previous paragraph; the second row, labeled 'nineties', is identical to the baseline in every way, except born in the 1990s rather than the 1960s.  The line labeled 'female' is identical to the baseline, but female.

These results are generated by running 201 random samples from the GSS data and computing the median, 2.5th and 97.5th percentiles.  The range from 'low2.5' to 'high97.5' forms a 95% confidence interval.

low2.5 median high97.5
baseline 71.5 73.6 75.1
nineties 60.1 63.8 68.5
eighties 64.9 67.7 69.8
seventies 67.8 70.1 72.0
fifties 70.7 72.8 74.8
forties 71.4 73.7 75.4
thirties 69.9 72.0 73.8
twenties 70.7 72.9 74.8
female 83.2 84.7 85.6
black 75.9 78.0 79.5
otherrace 78.2 80.4 82.9
hispanic 70.7 73.5 76.0
conservative 64.1 65.8 67.8
liberal 75.9 77.7 79.3
lowrealinc 69.1 71.1 72.9
highrealinc 73.2 75.2 76.7
college 73.2 75.1 76.3
urban 66.0 68.3 69.9
rural 59.5 62.0 64.0
threatened 68.8 71.1 73.1
gunhome 51.1 53.6 55.8
yminus10 84.1 85.3 86.2
yminus20 84.3 85.3 86.4
yminus30 80.7 81.5 82.8
yminus40 78.8 79.9 81.4
lowest combo 16.5 19.2 22.1
highest combo 91.3 92.3 93.4

Again, the hypothetical baseline person has a 74% chance of favoring a gun permit law.  A nearly identical person born in the 1990s has only a 64% chance.

To see this and the other effects more clearly, I computed the difference between each hypothetical person and the baseline, then sorted by the magnitude of the apparent effect.

low2.5 median high97.5
lowest combo -57.7 -54.5 -49.5
gunhome -21.1 -20.0 -18.6
rural -13.4 -11.5 -9.9
nineties -13.5 -9.8 -4.5
conservative -8.9 -7.6 -6.5
eighties -7.9 -5.9 -2.7
urban -6.6 -5.4 -4.1
seventies -5.6 -3.2 -1.4
lowrealinc -3.7 -2.5 -1.2
threatened -3.6 -2.4 -1.1
thirties -3.2 -1.5 -0.1
fifties -2.2 -0.8 0.8
twenties -2.8 -0.6 1.0
forties -1.7 -0.0 1.5
baseline 0.0 0.0 0.0
hispanic -1.5 0.2 2.0
college 0.6 1.6 2.6
highrealinc 0.7 1.7 2.6
liberal 2.9 4.2 5.2
black 3.1 4.5 5.8
yminus40 4.9 6.6 8.0
otherrace 4.8 6.9 9.6
yminus30 6.6 8.1 9.8
female 10.1 11.1 12.4
yminus20 10.2 11.7 14.0
yminus10 10.1 11.7 13.7
highest combo 16.9 18.8 21.1

All else being equal, someone who owns a gun is about 20 percentage points less likely to favor gun permit laws.

Compared to people born in the 1960s, people born in the 1990s are 10 points less likely.  People born in the 1980s and 1970s are also less likely, by 6 and 3 points.  People born in previous generations are not substantially different from people born in the 1960s (and the effect is not statistically significant).

Compared to suburbanites, people in rural and urban communities are less likely, by 12 and 5 points.

People in the lowest 25% of household income are less likely by 2.5 points; people in the highest 25% are more likely by 2 percentage point.

Blacks and other non-whites are more likely to favor gun permit laws, by 4.5 and 7 percentage points.

Compared to political moderates, conservatives are 8 points less likely and liberals are 4 points more likely to favor gun permit laws.

Compared to men, women are 11 points more likely to favor gun permit laws.

Controlling for all of these factors, the period effect persists: people with the same attributes surveyed 10, 20, 30, and 40 years ago would have been 10, 10, 7, and 5 points more likely to favor gun permit laws.

In these results, Hispanics are not significantly different from non-Hispanic whites.  But because of the way the GSS asked about Hispanic background, this variable is missing a lot of data; these results might not mean much.

Surprisingly, people who report that they have been "threatened with a gun or shot at" are 2 percentage points LESS likely to favor gun permit laws.  This effect is small but statistically significant, and it is consistent in many versions of the model.  A possible explanation is that this variable captures information about the respondent's relationship with guns that is not captured by other variables.  For example, if a respondent does not have a gun at home, but spends time around people who do, they might be more likely to have been threatened and also more likely to share cultural values with gun owners.  Alternatively, since this question was only asked until 1996, it's possible that it is capturing a period effect, at least in part.

"Lowest combo" represents a hypothetical person with all attributes associated with lower support for gun laws: a white conservative male born in the 1990s, living in a rural area, with household income in the lowest 25%, who has not attended college, who owns a gun, and has been threatened with a gun or shot at.  Such a person has a 19% change of favoring a gun permit law, 54 points lower than the baseline.

"Highest combo" represents a hypothetical person with all attributes associated with higher support for gun laws: a mixed race liberal woman born in the 1960s or before, living in a suburb, with household income in the highest 25%, who has attended college, does not own a gun, and has not been threatened with one.  Such a person has a 92% chance of favoring a gun permit law, 19 points higher than the baseline.

[You might be surprised that these results as asymmetric: that is, that the lowest combo is farther from the baseline than the highest combo.  The reason is that the "distance" between probabilities is not linear.  For more about that, see my previous article on the challenges of interpreting probablistic predictions].

### Methodology

The entire analysisThe steps are:

2) For each year of the survey, use weighted bootstrap to select a random sample that accounts for the stratified sampling in the GSS design.

3) Fill missing values in each column by drawing random samples from the valid responses.

4) Convert some numerical and categorical variables to boolean; for example 'conservative' and 'liberal' are based on the categorical variable 'polviews'; and 'lowrealinc' and 'highrealinc' are based on the numerical variable 'realinc'.

5) Use logistic regression to estimate model parameters, which are in terms of log odds.

6) Use the model to make predictions for each of the hypothetical people in the tables, in terms of probabilities.

7) Compute the predicted difference between each hypothetical person and the baseline.

By repeating steps (2) through (7) about 200 times, we get a distribution of estimates that accounts for uncertainty due to random sampling and missing values.  From these distributions, we can select the median and a 95% confidence interval, as reported in the tables above.

## Thursday, March 1, 2018

### Support for gun control is decreasing in all age groups

This is the third article in a series about changes in support for gun control over the last 50 years.

In the first article I looked at data from the General Social Survey and found that young adults are less likely than previous generations to support gun control.

In the second article I looked at data from the CIRP Freshman Survey and found that even the youngest adults, who grew up with lockdown drills and graphic news coverage of school shootings, are still LESS likely to support gun control.

### Untangling age, period, and cohort effects

In this article, I do some age-period-cohort analysis to see if the changes over the last 50 years are due to age, period, or cohort effects:

Age effect: People's views might change over the course of their lives.  For example, they might be more likely to support gun rights when they are young, and more likely to support gun control when they have children. (This turns out not to be true.)

Period effect: People's views might change due to an external factor that affects all age groups and cohorts over the same time period.  For example, if gun crime rates increase, we might expect support for gun control to increase.  (There is some evidence for this.)

Cohort effect: People's views might be different from one generation to the next, due to differences in the environment.  For example, current teenagers might support gun control because of their experiences with school shootings. (This turns out not to be true.)

As the composition of the population changes over time, it can be hard to untangle these effects, but the design of the Generation Social Survey (GSS) makes it possible.  From 1972 to 2016, the GSS asked respondents
Would you favor or oppose a law which would require a person to obtain a police permit before he or she could buy a gun?
The following figure shows the fraction of respondents who would favor this law:

In the 1970s and 80s, support for this policy was near 75%.  It increased during the 1990s, peaking near 85% around 2000, and has been declining ever since.  In the most recent survey year, it is at 71%.

### Testing for age effects

To test for age effects, we can group respondents into cohorts by decade of birth and plot support for gun control as a function of age.

If there is an age effect we would expect all cohorts to follow a similar trajectory as they age.  For example, if people are more likely to support gun control during their child-bearing years, we would expect these line to generally increase from left to right.

Here are the results:

There are no obvious patterns here, which suggest that there is no age effect.

### Testing for period effects

To test for period effects, we group by decade of birth again, and plot the results over time.  If there is a period effect, we expect all cohorts to follow a similar trajectory.

Here are the results:

This figure shows clear evidence for a period effect: all cohorts follow a similar trajectory over the same period.  (Don't be distracted by the extreme first points in the green and purple curves; they are based on a small number of respondents.)

Looking at the last few points in each cohort, it looks like people born in the 1980s and 90s are less likely to support gun control than previous generations, but this figure does not show strong evidence for a cohort effect.

In summary, there is strong evidence for a period effect: support for gun control increased among all groups increased during the 1980s and 90s, and has been falling in all groups since 2000.

### Violent crime rates

A possible explanation is that these trends are driven by changes in violent crime, especially gun violence, which increased during the 1980s, peaked in 1993, and has been falling ever since, according to this study from the Pew Research Center.

To investigate this more carefully, I would like to see a graph of people's perception of violent crime rates, which does not always track reality.

### Breakdown by political views

In general, liberals are more likely to support gun control than conservatives; we might expect a period effect to have different impact on different groups.  The following figure shows support for gun control over time, grouped by self-reported political identity:

Whatever external forces caused the increase and subsequent decrease in support for gun control, it affected all groups over the same period.  The most recent decreases seems to be bigger among conservatives, so the gap may be growing.

### Breakdown by race

Nonwhites are more likely to support gun control than whites by about 8 percentage points.  The following figure shows how this difference has changed over time:

Both groups were affected similarly over the same period.  Among nonwhites, support for gun control might have increased sooner, in the 1980s, and might be falling more slowly now.

## Wednesday, February 28, 2018

### Post-Columbine students do not support gun control

In their coverage of the Parkland school shooting, The Economist writes:
Though polling suggests that young people are only slightly more in favour of gun-control measures than their elders, those surveys focus on those aged 18 and above. There may be a pre- and post-Columbine divide within that group.
Based on my analysis of data from the General Social Survey (GSS) and the CIRP Freshman Survey, I think the first sentence is false and the second is unlikely: young people are substantially less in favor of gun-control measures than their elders.

Here's the figure, from my previous article, showing these trends:

The blue line shows the fraction of respondents in the GSS who would favor "a law which would require a person to obtain a police permit before he or she could buy a gun?"

Among people born before 1980, support for this form of gun control is strong: around 75% for people born between 1910 and 1940, and approaching 80% for people born between 1950 and 1980.
But among people born in the 1980s and 90s, support for gun control is below 70%.

The orange line shows the fraction of respondents to the CIRP Freshman Survey who "Agree" or "Strongly agree" that
The federal government should do more to control the sale of handguns.
This dataset does not go back as far, but shows the same pattern: a large majority of people born before 1980 supported gun control (when they were surveyed as college freshmen); among people born after 1980, far fewer support gun control.

However, these results are based on people people who are currently young adults.  Maybe, as the Economist speculates:
The pupils, in their late teens, started their education after a massacre at Columbine High School in Colorado in 1999, in which 13 were killed. That means they have been practising active-shooter drills in the classroom since kindergarten. Seeing a school shooting as an event to prepare for, rather than an awful aberration, seems to have fuelled the students’ anger.
They may be angry, but at least so far, their anger has not led them to support gun control.  Data from the Freshman Survey makes this clear.  The following figure shows, for survey respondents from 1989 to 2013, the fraction that agree or strongly agree that:
The federal government should do more to control the sale of handguns.
And for respondents in 2016, the fraction that agree or strongly agree that:
The federal government should have stricter gun control laws.

The change in wording makes it hard to compare the last data point with the previous trend, but it is clear at least that college freshmen in 2013 were substantially less likely than previous generations to support gun control: at 64%, they were 20 percentage points down from the peak, at 84%.

A large majority of the 2013 respondents were born in 1995.  They were 3 when Columbine was in the news, 10 during the Red Lake shootings, 11 during the West Nickel Mines school shooting, 12 during the Virginia Tech shooting, and 13 during the Northern Illinois University shooting.

They were 17 during the Chardon High School shooting, the Oikos University shooting, and the Sandy Hook Elementary School shooting.

And when they were surveyed in 2013, less than a year after Sandy Hook, more than 33% of them did not agree that the federal government should do more to control the sale of handguns, more than in any previous year of the survey.

Seeing these horrific events in the news, during their entire conscious lives, with increasingly dramatic and graphic coverage, might have made these students angry, but it did not make more of them support gun control.

Practicing active-shooter drills since kindergarten might have made these students angry, but it did not make more of them support gun control.

Maybe, as The Economist suggests, these students see a school shooting as "an event to prepare for, rather than an awful aberration".   But that does not make them more likely to support gun control.

## Tuesday, February 27, 2018

### Support for gun control is lower among young adults

In current discussions of gun policies, many advocates of gun control talk as if time is on their side; that is, they assume that young people are more likely than old people to support gun control.

This letter to the editor of the Economist summarizes the argument:
It is unlikely that a generation raised on lockdown drills, with access to phone footage of gun rampages and a waning interest in hunting, will grow up parroting the National Rifle Association’s rhetoric as enthusiastically as today's political leaders. Change is coming.
And in a recent television interview, a survivor of the Parkland school shooting told opponents of gun control:
You might as well stop now, because we are going to outlive you.
But these assumptions turn out to be false.  In fact, young adults are substantially less likely to support gun control than previous generations.

The following figure shows results I generated from the General Social Survey (GSS) and the CIRP Freshman Survey, plotting support for gun control by year of birth.

The blue line shows the fraction of respondents in the GSS who answered "Favor" to the following question:
Would you favor or oppose a law which would require a person to obtain a police permit before he or she could buy a gun?
Among people born before 1980, support for this form of gun control is strong: around 75% for people born between 1910 and 1940, and approaching 80% for people born between 1950 and 1980.

But among people born in the 1980s and 90s, support for gun control is below 70%.

The orange line shows the fraction of respondents to the CIRP Freshman Survey who "Agree" or "Strongly agree" that
The federal government should do more to control the sale of handguns.
This dataset does not go back as far, but shows the same pattern: a large majority of people born before 1980 supported gun control (when they were surveyed as college freshmen); among people born after 1980, far fewer support gun control.

### Other studies

I am not the only one to notice these patterns.  This Vox article from last week reports on similar results from a 2015 Pew Survey and a 2015 Gallup Poll.

The Pew survey found that young adults are less likely than other age groups to support a ban on assault weapons (although they are also more likely to support a federal database of gun sales, and not substantially different from other age groups on some other policy proposals):

This page from the Pew Research Center shows responses to the question
What do you think is more important – to protect the right of Americans to own guns, OR to control gun ownership?
Here are the results:

Before 2007, young adults were the least likely group to choose gun rights over gun control (see the orange line).  Since then, successive cohorts of young adults have shifted substantially away from gun control.

This Gallup poll shows that current young adults are more likely than previous generations to believe that more concealed weapons would make the U.S. safer:

Each of these sources is based on different questions asked of different groups, but they show remarkably consistent results.

The GSS is based on a representative sample of the adult U.S. population.  It includes people of different ages, so it provides insight into the effect of birth year and age.  The Freshman Survey includes only first-year college students, so it is not representative of the general population.  But because all respondents are observed at the same age, it gives the clearest picture of generational changes.

### The NRA regime

A possible explanation for these changes is that since the NRA created its lobbying branch in 1975 and its political action committee in 1976, it has succeeded in making gun rights (and opposition to gun control) part of the conservative identity.

We should expect their efforts to have the biggest effect on the generation raised in the 1980s and 90s, and we should expect them to have a stronger effect on conservatives than liberals.

The following figure shows the same data from the GSS, grouped by political self-identification:

As expected, support for gun control has dropped most among people who identify as conservative.

Among moderates, it might have dropped, but not by as much.  The last data point, for people born around 1995, might be back up, but it is based on a small sample, and may not be reliable.

Support among liberals has been mostly unchanged, except for the last point in the series which, again, may not be reliable, as indicated by the wide error bars.

These results suggest that the decrease in support for gun control has been driven primarily by changing views among young conservatives.

UPDATE: NPR has a related story from a few days ago.  They report that "Millennials are no more liberal on gun control than their parents or grandparents — despite diverging from their elders on the legalization of marijuana, same-sex marriage and other social issues."

## Friday, February 23, 2018

### The six stages of computational science

This is the second in a series of articles related to computational science and education.  The first article is here.

### The Six Stages of Computational Science

When I was in grad school, I collaborated with a research group working on computational fluid dynamics.  They had accumulated a large, complex code base, and it was starting to show signs of strain.  Parts of the system, written by students who had graduated, had become black magic: no one knew how they worked, and everyone was afraid to touch them.  When new students joined the group, it took longer and longer for them to get oriented.  And everyone was spending more time debugging than developing new features or generating results.

When I inspected the code, I found what you might expect: low readability, missing documentation, large functions with complex interfaces, poor organization, minimal error checking, and no automated tests.  In the absence of version control, they had many versions of every file, scattered across several machines.

I'm not sure if anyone could have helped them, but I am sure I didn't.  To be honest, my own coding practices were not much better than theirs, at the time.

The problem, as I see it now, is that we were caught in a transitional form of evolution: the nature of scientific computing was changing quickly; professional practice, and the skills of the practitioners, weren't keeping up.

To explain what I mean, I propose a series of stages describing practices for scientific computing.
• Stage 1, Calculating:  Mostly plugging numbers into into formulas, using a computer as a glorified calculator.
• Stage 2, Scripting: Short programs using built in functions, mostly straight line code, few user-defined functions.
• Stage 3, Hacking: Longer programs with poor code quality, usually lacking documentation.
• Stage 4, Coding: Good quality code which is readable, demonstrably correct, and well documented.
• Stage 5, Architecting: Code organized in functions, classes (maybe), and libraries with well designed APIs.
• Stage 6, Engineering: Code under version control, with automated tests, build automation, and configuration management.
These stages are, very roughly, historical.  In the earliest days of computational science, most projects were at Stages 1 and 2.  In the last 10 years, more projects are moving into Stages 4, 5, and 6.  But that project I worked on in grad school was stuck at Stage 3.

#### The Valley of Unreliable Science

These stages trace a U-shaped curve of reliability:

By "reliable", I mean science that provides valid explanations, correct predictions, and designs that work.

At Stage 1, Calculating, the primary scientific result is usually analytic.  The correctness of the result is demonstrated in the form of a proof, using math notation along with natural and technical language.  Reviewers and future researchers are expected to review the proof, but no one checks the calculation.  Fundamentally, Stage 1 is no different from pre-computational, analysis-based science; we should expect it to be as reliable as our ability to read and check proofs, and to press the right buttons on a calculator.

At Stage 2, Scripting, the primary result is still analytic, the supporting scripts are simple enough to be demonstrably correct, and the libraries they use are presumed to be correct.

But Stage 2 scripts are not always made available for review, making it hard to check their correctness or reproduce their results.  Nevertheless, Stage 2 was considered acceptable practice for a long time; and in some fields, it still is.

Stage 3, Hacking, has the same hazards as Stage 2, but at a level that's no longer acceptable.  Small, simple scripts tend to grow into large, complex programs.  Often, they contain implementation details that are not documented anywhere, and there is no practical way to check their correctness.

Stage 3 is not reliable because it is not reproducible. define reproducibility as "the ability to recompute data analytic results given an observed dataset and knowledge of the data analysis pipeline."

Reproducibility does not guarantee reliability, as Leek and Peng acknowledge in the title of their article, "Reproducible research can still be wrong". But without reproducibility as a requirement of published research, there is no way to be confident of its reliability.

#### Climbing out of the valley

Stages 4, 5, and 6 are the antidote to Stage 3.  They describe what's needed to make computational science reproducible, and therefore more likely to be reliable.

At a minimum, reviewers of a publication and future researchers should be able to:

2) Run tests and review source code to verify correctness.

3) Run a build process to execute the computation.

To achieve these goals, we need the tools of software engineering:

1) Version control makes it possible to maintain an archived version of the code used to produce a particular result.  Examples include Git and Subversion.

2) During development, automated tests make programs more likely to be correct; they also tend to improve code quality.  During review, they provide evidence of correctness, and for future researchers they provide what is often the most useful form of documentation.  Examples include unittest and nose for Python and JUnit for Java.

3) Automated build systems document the high-level structure of a computation: which programs process which data, what outputs they produce, etc.  Examples include Make and Ant.

4) Configuration management tools document the details of the computational environment where the result was produced, including the programming languages, libraries, and system-level software the results depend on.  Examples include package managers like Conda that document a set of packages, containers like Docker that also document system software, and virtual machines that actually contain the entire environment needed to run a computation.

These are the ropes and grappling hooks we need to climb out of the Valley of Unreliable Science.

Unfortunately, most people working in computational science did not learn these tools in school, and they are not easy to learn.  For example, Git, which has emerged as the dominant version control system, is notoriously hard to use.  Even with GitHub and graphical clients, it's still hard.  We have a lot of work to do to make these tools better.

Nevertheless, it is possible to learn basic use of these tools with a reasonable investment of time.  Software Carpentry offers a three hour workshop on Git and a 4.5 hour workshop on automated build systems.  You could do both in a day (although I'm not sure I'd recommend it).

#### Implications for practitioners

There are two ways to avoid getting stuck in the Valley of Unreliable Science:

1) Navigate Through It: One common strategy is to start with simple scripts; if they grow and get too complex, you can improve code quality as needed, add tests and documentation, and put the code under version control when it is ready to be released.

2) Jump Over It: The alternative strategy is to maintain good quality code, write documentation and tests along with the code (or before), and keep all code under version control.

Naively, it seems like Navigating is better for agility: when you start a new project, you can avoid the costs of over-engineering and test ideas quickly.  If they fail, they fail fast; and if they succeed, you can add elements of Stages 4, 5, and 6 on demand.

Based on that thinking, I used to be a Navigator, but now I am a Jumper.  Here's what changed my mind:

1) The dangers of over-engineering during the early stages of a project are overstated.  If you are in the habit of creating a new repository for each project (or creating a directory in an existing repository), and you start with a template project that includes a testing framework, the initial investment is pretty minimal.  It's like starting every program with a copy of "Hello, World".

2) The dangers of engineering too late are much greater: if you don't have tests, it's hard to refactor code; if you can't refactor, it's hard to maintain code quality; when code quality degrades, debugging time goes up; and if you don't have version control, you can't revert to a previous working (?) version.

3) Writing documentation saves time you would otherwise spend trying to understand code.

4) Writing tests saves time you would otherwise spend debugging.

5) Writing documentation and tests as you go along also improves software architecture, which makes code more reusable, and that saves time you (and other researchers) would otherwise spend reimplementing the wheel.

6) Version control makes collaboration more efficient.  It provides a record of who changed what and when, which facilitates code and data integrity.  It provides mechanisms for developing new code without breaking the old.  And it provides a better form of file backup, organized in coherent changes, rather than by date.

Maybe surprisingly, using software engineering tools early in a project doesn't hurt agility; it actually facilitates it.

#### Implications for education

For computational scientists, I think it's better to jump over the Valley of Unreliable Science than try to navigate through it.  So what does that imply for education?  Should we teach the tools and practices of software engineering right from the beginning?  Or do students have to spend time navigating the Valley before they learn to jump over it?

I'll address these questions in the next article.