Probably Overthinking It: Fog warning system: part three

Background: I am trying to evaluate the effect on traffic safety of a fog warning system deployed in California in November 1996. The system was installed by CalTrans on a section of I-5 and SR-120 near Stockton where the accident rate is generally high, particularly during the morning commute when ground fog is common. The warning system consists of (1) weather monitoring stations that detect fog and (2) changeable message signs that warn drivers to reduce speed.

I will post my findings as I go in order to solicit comments from professionals and demonstrate methods for students. If I can get permission, I will also post my data and code so you can follow along at home.

Previously: In the first installment I reviewed the first batch of data I am working with, and ran some tests to confirm that Poisson regression is appropriate for modeling the number of accidents in a given day. In part two I ran Poisson regressions to identify factors that influence the number of accidents per day.

Critical events

I have been waiting to get more details about several events that affected traffic safety during the observation period. I was able to get in touch with a Transportation Engineer in the Traffic Safety Branch of Caltrans District 10, which includes the study area. According to Caltrans records, the speed limit on the relevant section of I-5 was increased from 55 to 70 mph on March 25, 1996. The speed limit on SR-120 was increased from 55 to 65 mph about a month later, on April 22, 1996. Many thanks to my correspondent for this information!

The automated warning system was activated in November 1996. My collaborator has collected data on weather measurements made by the system and the warning it displayed. I hope to get this data processed soon.

Accidents per million vehicles

In the previous article, I ran models with raw accident counts as the dependent variable, and found that traffic volume is a significant explanatory variable. Not surprisingly, more cars yield more accidents.

Rather than use volume as an explanatory variable, an alternative is to express the dependent variable in terms of accidents per million vehicles. As a reminder, here's what the traffic volume (in thousands of cars per day) looks like during the observation period:

And here are the raw accident counts:

I divided counts by volume and converted to accidents per million cars. At the same time I smoothed the curves by aggregating quarterly. Here's what that looks like:

The vertical red lines show major events expected to affect traffic safety: increased speed limits in March and April 1996, and the activation of the warning system in November 1996.

This graph suggests several observations:

In the control directions, the accident rate was flat from 1992 through 1994, increased quickly in 1995 (before the speed limits were increased) and has been flat every since.
In the treatment directions, the accident rate was trending down until late 1996, including three quarters after the speed limit was increased. The accident rate increased sharply in 1997 and possibly again in 2000.
The accident rate in both directions was unusually low during the third quarter of 1996, when the warning system was activated. Other than that, there is no obvious relationship between accident rates and the events of 1996.

Since we don't expect the warning system to have much effect on the control directions (that's why they're called "control"), the speed limit changes are by far the most likely explanation for the accident rate changes. But it is puzzling that a large part of the change occurred before the new speed limits went into effect. One possibility is that as new speed limits were rolled out throughout California, drivers became accustomed to higher speeds and drove faster even on roads where the new limits were not in effect. But if that's true, it doesn't explain the continuing decline in the treatment directions.

My collaborator has some data on actual driving speeds before and after 1996. Once I process that data, I will be able to get back to this puzzle.

Injuries and fatal accidents

In response to a previous post, a reader suggested that if the warning system causes drivers to slow down, it might affect the severity of accidents more than the raw number. To investigate that possibility, I also plotted the rates for injury accidents (including fatalities) and fatal accidents.

Here is the graph for injury accidents:

The patterns we saw in the previous graph appear here, too. In addition, this graph suggests, more strongly, the possibility of a second changepoint in late 1999 or 2000.

And here is the graph for fatal accidents:

The number of fatal accidents is, fortunately, small. During more than 10 years of observation, there were only 26 in the study area. The trends in the other graphs are not apparent here, other than the general increase in the rate of fatal accidents in the second half of the observation period.

Summary

Accident rates in the control and treatment directions increased sharply around 1996, but neither effect is related in an obvious way to increased speed limits or deployment of the warning system.
Accident rates were unusually low in the quarter the warning system was activated; other than that, no effect of the warning system is apparent.
It looks like there was a second increase in accident rates in late 1999 or 2000. I will ask my correspondent at Caltrans if he has an explanation.

Next steps

There's not much more I want to do with this data. Now I need more numbers! In particular, I will be able to get data from the warning system itself, including:

Conditions measured at roadside weather stations, which should be better than the data I have from the airport 8 miles away, and
Messages displayed when the warning system was active.

If the warning system has an effect, it should be apparent on the days it is active. By comparing the treatment and control directions, it should be possible to quantify the effect.

Also, I have permission now to share the data; I will try to get it posted, along with my code, before the next update.

[UPDATE April 26, 2012]

A reader asked

I can think of two ways that overall traffic volume affects accident rates: (1) more cars = more accidents overall, which you control for by measuring accident rates, and now you're seeing rising accident rates per car. So this raises the next thought, (2) more cars = more traffic density, which raises accident rates per car for each car on the road.

What happens if you regress on traffic volume squared, or include traffic volume as an independent variable in the accident rate regression? The density effect is likely nonlinear but it's a thought.

This is a great question. If there is a non-linear relationship between traffic volume and the raw number of accidents, then even after we switch to accident rates, there might still be a positive relationship between traffic volume and accident rates.

I ran these regressions, and in fact there is a relationship, but with the limitations of the data I have, I don't think it means much. Specifically, I only have annual estimates for traffic volume, so there's no fluctuation over time; traffic volume increases at a nearly constant rate for the entire observation period (see the figure above).

So traffic volume will have a positive relationship with anything else that's increasing, and a negative relationship with anything decreasing. And that's what I see in the regressions:

All of the relationships are statistically significant, but notice that in the treatment directions, before 1996 when the accident rate was declining, the relationship with traffic volume is negative!

I don't think this variable has any explanatory content; any other ramp function would behave the same way. If I can get finer-grain data on traffic volume, I might be able to look for a more meaningful effect.

10 comments:

Ben UApril 25, 2012 at 12:10 PM
Fascinating project. Could more widespread use of cell phones leading to more distracted drivers cause the jump you see at the end of the 1990's?
GaryApril 25, 2012 at 2:26 PM
Allen - I can think of two ways that overall traffic volume affects accident rates: (1) more cars = more accidents overall, which you control for by measuring accident rates, and now you're seeing rising accident rates per car. So this raises the next thought, (2) more cars = more traffic density, which raises accident rates per car for each car on the road.

What happens if you regress on traffic volume squared, or include traffic volume as an independent variable in the accident rate regression? The density effect is likely nonlinear but it's a thought.
Tom Campbell-RickettsApril 27, 2012 at 2:56 AM
Really nice idea to blog on the research as it unfolds.

Speaking of Poisson processes, perhaps some of the readers here might like this simple puzzle I recently posted.

Wednesday, April 25, 2012

Fog warning system: part three