1 / 17

Joni Nunnery and Helmut Schneider

Using Data Mining and Bootstrapping to Develop Simple Models for Obtaining Confidence Intervals for the Percentage of Alcohol Related Crashes. Joni Nunnery and Helmut Schneider. Why Data Mining?. NHTSA Estimate is for the USA State estimates are not readily available

Download Presentation

Joni Nunnery and Helmut Schneider

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Data Mining and Bootstrapping to Develop Simple Models for Obtaining Confidence Intervals for the Percentage of Alcohol Related Crashes Joni Nunnery and Helmut Schneider

  2. Why Data Mining? • NHTSA Estimate is for the USA • State estimates are not readily available • Need for reliable standard errors for states • 0.3% for USA 2% for LA • State estimate may be effected by local variables • Non-crash independent variables may change over time • DWI versus pretrial diversion • IM estimates complicated statistical technique • Data Mining tools are used in various applications

  3. Approach • Analysis of Louisiana Crash Data 1999-2002 • Data mining model is used to predict alcohol involvement • Estimation of standard error via bootstrap type simulation

  4. KNOWN ALCOHOL TESTS RESULTSLOUISIANAN 1999-2002

  5. ROW PERCENTAGES

  6. All Drivers in CrashesLouisiana 1999-2002

  7. Using Insightful Miner Data Mining Software

  8. Classification Models • Logistic Regression • Naive Bayes • Neural Network • Classification Tree

  9. Classification Tree • Fit model to half the data • Tree model • What did we learn? – • Importance of variables

  10. Classification Results

  11. Violation Hour of Day Vehicle Type Age Injury Parish Number of Vehicles Belt Usage Day of Week Gender

  12. Alcohol in Injury and Property Damage Crashes

  13. Standard Error • Using simulation on second half of data set to get estimated error • Evaluate combined standard error • The resulting standard error is 1% for 900 crashes

  14. Conclusion • Data mining is a simple and useful tool to predict missing observations • The best predictor for alcohol related crashes are the judgment of a well trained police officer on the scene

  15. Alcohol-Related Crashes in Louisiana by Highway Return

More Related