Collision Patterns and Reporting Blind Spots in 970 California Autonomous Vehicle Crash Reports

Alam, M. S., Zhang, L., Li, Z., Dou, F., Bazilinskyy, P.

Submitted (2026)

ABSTRACT Autonomous vehicle collision reports offer a rare view of how autonomous driving systems perform in mixed traffic, but they are difficult to analyse at scale because they combine structured form fields, visually marked elements, and free text narratives. We analysed 970 publicly available California Department of Motor Vehicles collision reports, with dated reports spanning October 2014 to March 2026, using a ChatGPT 5.4 Thinking extraction pipeline to derive a structured dataset for empirical analysis. The largest scenario classes were rear end crashes stopped by AV (266, 27.4%), intersection lateral conflicts (180, 18.6%) and lane change or merge conflicts (156, 16.1%). Reports captured coarse scenario structure much more reliably than fine interaction detail, with a mean coarse context score of 0.97 versus 0.48 for fine context. The results suggest that many reported collisions occur in mixed traffic situations where prediction, coordination, and road user expectations are difficult to align.