I will post my findings as I go in order to solicit comments from professionals and demonstrate methods for students. If I can get permission, I will also post my data and code so you can follow along at home.
Previously: In the previous installment I reviewed the first batch of data I'll work with, and ran some tests to confirm that Poisson regression is appropriate for modeling the number of accidents in a given day.
Poisson regressionsTraffic volume
To measure the effect of traffic volume on the number of accidents, I ran a Poisson regression with one row of data per day from January 1, 1992 through March 31, 2002, which is 3742 days.
- Dependent variable: number of accidents in a day. "Accidents" includes all accidents, "injury" includes only accidents involving an injury or fatality, and "fatal" includes only fatal accidents.
- Explanatory variable: AADT, which is annualized average daily traffic, in units of 1000s of cars.
Here are the results:
The columns are:
- sig: statistical significance. * indicates borderline significance (p-values near 0.5); ** is significant (p-values less than 0.01); *** is highly significant (very small p-values).
- coeff: the estimated coefficient of the regression. For example, the coefficient 0.03 means that an increase of one unit of AADT (1000 cars) yields and increase of 0.03 in the expected log(count), where count is the number of accidents.
- % incr is the coefficient converted to percentage increase per unit of AADT. In this example, the coefficient 0.03 indicates that for an increase of 1000 cars per day, we expect an increase in the number of accidents of roughly 3%.
Not surprisingly, increase traffic volumes are associated with more accidents (of all types). In the control and treatment directions, the increase is about the same size, 3-4% for each additional 1000 cars.
For fatal accidents the association is less statistically significant, most likely because the number of fatal accidents is much smaller. There were 1900 accidents, total, during the observation period; 705 involved injuries but no fatalities; only 26 were fatal.
I conclude that traffic volume has a substantial effect on the number of accidents, so
- It should be included as an explanatory variable in subsequent models, and
- It might be important to get additional traffic data, broken down by direction of travel and a finer scale than annual!
The next set of explanatory variables I considered is:
- Fog: binary variable, whether fog was detected at the airport weather station during the day.
- Heavy fog: same as above, but apparently based on a different visibility threshold.
- Precip: total precipitation for the day in 0.1 mm units.
And here's the first surprise. Controlling for traffic volume, there is no significant relationship between fog and the number of accidents (of any kind).
For heavy fog, there is generally no significant relationship, but:
- In the control directions only, heavy fog has a significant effect, but the coefficient is negative. If this effect is real, heavy fog decreases the number of accidents by about 30%.
- If we break the data set into before and after November 1996, the effect disappears in the "before" dataset.
- There is no apparent effect on fatal accidents.
There are several possible conclusions:
- This effect is real, and for some reason heavy fog actually decreases the number of accidents, but only in the control direction.
- This effect is a statistical fluke, and the fog variables have no explanatory power. In that case, it is possible that fog in the study area does cause accidents, but measurements at the airport do not reflect conditions in the study area (8 miles away).
On the other hand, the effect of precipitation is consistent, significant, and (as expected) dangerous. Here are the results for precipitation (controlling for traffic volume):
Each millimeter of precipitation increases the number of accidents by about half a percent. [I am not sure how seriously to take that interpretation, since this relationship is probably non-linear. It might be better to make binary variables like "rain" and "heavy rain".]
Here's what we have so far:
Here's what we have so far:
- As expected, more traffic yields more accidents.
- Surprisingly, there is not statistical relationship between our fog measurements and accident rates.
- There is a consistent relationship between precipitation and accidents, but I might have to come back and quantify it more carefully.
Before going farther, I want to get more specific information about when the speed limits were changed on these road segments and when the warning system was deployed. So that's all for now.