Eric Ries wrote an article for TechCrunch last week, talking about racism and meritocracy among Silicon Valley entrepreneurs. It's a good article; you should read it and then come back.
Although I mostly agree with him, Ries undermines his argument with a statistical bait-and-switch: he starts out talking about race, but most of the article (and the slide deck he refers to) are about gender. Unfortunately, for both his argument and the world, the race gap is bigger than the gender gap, and it is compounded because racial minorities, unlike women, are minorities.
To quantify the size of the gap, I use data from Academically Adrift, a recent book that reports the results from the Collegiate Learning Assessment (CLA) database, collected by the Council for Aid to Education, "a national nonprofit organization ... established in 1952 to advance corporate support of education and to conduct policy research on higher education..."
Here is a description of the CLA:
"The CLA consists of three types of prompts within two types of task: the Performance Task and the Analytic Writing Task...The Analytic Writing Task includes a pair of prompts called Make-an-Argument and Critique-an-Argument.
"The CLA uses direct measures of skills in which students perform cognitively demanding tasks... All CLA measures are administered online and contain open-ended prompts that require constructed responses. There are no multiple-choice questions. The CLA tasks require that students integrate critical thinking and written communication skills. The holistic integration of these skills on the CLA tasks mirrors the requirements of serious thinking and writing tasks faced in life outside of the classroom. "This is not your father's SAT. The exam simulates realistic workplace tasks and assesses skills that are relevant to many jobs, including (maybe especially) entrepreneurship.
On this assessment, the measured differences between black and white college students are stark. For white college students, the mean and standard deviation are 1170 ± 179. For black students, they are 995 ± 167.
To get a sense of what that difference looks like, suppose there are just two groups, which I call "blue" and "green" as a reminder that I am presenting an abstract model and not a realistic description. This figure shows Gaussian distributions with the parameters reported in Academically Adrift:
The difference in means is 175 points, which is about one standard deviation. If we select people from the upper tail, the majority are blue. But the situation is even worse if greens are a minority. If greens make up 20% of the population, the picture looks like this:
The fraction of greens in the upper tail is even smaller. If, as Ries suggests, "Here in Silicon Valley, we’re looking for the absolute best and brightest, the people far out on the tail end of aptitude," the number of greens in that tail is very small.
How small? That depends on where we draw the line. If we select people who score above 1200, which includes 37% of the population, we get 6% greens (remember that they are 20% of the hypothetical population). Above 1300 the proportion of greens is 3%, and above 1400 only 2%.
And that's not very "far out on the tail end of aptitude." Above 1500, we are still talking about 3% of the general population, but more than 99% of them are blue. So in this hypothetical world of blues and greens, perfect meritocracy does not lead to proportional representation.
Ries suggests that blind screening of applicants might help. I think the system he proposes is a good idea, because it improves fairness and also the perception of fairness. But if the racial gap in Y Combinator's applicant pool is similar to the racial gap in CLA scores, making the selection process more meritocratic won't make a big difference.
These numbers are bad. I'm sorry to be reporting them, and if I know the Internet, some people are going to call me a racist for doing it. But I didn't make them up, and I'm pretty sure I did the math right. Of course, you are welcome to disagree with my conclusions.
Here are some of the objections I expect:
1) The CLA does not capture the full range of skills successful entrepreneurs need.
Of course it doesn't; no test could. But I chose the CLA because I think it assesses thinking skills better than other standardized tests, and because the database includes "over 200,000 student results across hundreds of colleges." I can't think of a better way to estimate the magnitude of the racial gap in the applicant pool.
2) The application process is biased against racial minorities and women.
The statistics I am reporting here, and my analysis of them, don't say anything about whether or not the application process is biased. But they do suggest (a) We should not assume that because racial minorities are underrepresented among Silicon Valley entrepreneurs, racial bias explains a large part of the effect, and (b) We should not assume that eliminating bias from the process will have a large effect.
Of course, trying to eliminate bias is the right thing to do, whether the effect is big or small.
NOTE: The range of scores for the CLA was capped at 1600 until 2007, which changed the shape of the distribution at the high end. For those years, the Gaussian distributions in the figures are not exactly right, but I don't think it affects my analysis much. Since 2007, scores are no longer capped, but I don't know what the tail of the distribution looks like now.
EDIT 11-28-11: I revised a few sentences to clarify whether I was talking about representation or absolute numbers. The fraction of greens in the population affects the absolute numbers in the tail but not their representation.