1 / 46

Today’s program

Today’s program. Econometrics is better suited for accident analysis than for economics. Why? Accident modelling implications Remedies against overfitting Structure of the TRULS model. INRETS, Arcueil, 30-31 May 2007. Aim of workshop

Download Presentation

Today’s program

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today’s program • Econometrics is better suited for accident analysis than for economics. Why? • Accident modelling implications • Remedies against overfitting • Structure of the TRULS model INRETS, Arcueil, 30-31 May 2007

  2. Aim of workshop Explore the shared interest in monitoring, explaining and forecasting road safety developments at the national or regional level. Approach Multivariate, structural modelling, • focusing on substantive, causal relationships • distinguishing exposure from risk and • accident frequency from severity, • while acknowledging the importance of estimating functional form as well as first derivatives.

  3. Today’s program 10.00: The TRULS model for Norway 10.30: The DRAG-3 model for Quebec 11.00: The KILOM-2 and TAG-2 models for France 11.30: The Intercity Traffic Model for France 11.45: Coffee 12.00: Developing a model for Spain 12.45: The DRAG-Algeria model 13.10: Lunch 14.15: The national model for Belgium 14.30: The regional model for Stockholm, Sweden 14.45: Prospects for a Danish model 15.00: Modelling overall Dutch safety performance 15.30: On simultaneous traffic-accident structures 15.55: Coffee 16.10: Accounting for spatial correlation in classical regression 16.30: The NERDS-RSVP Consortium position paper 16.45: End

  4. The TRULS model for Norway – and other issues in accident modelling by Lasse Fridstrøm Managing Director Institute of Transport Economics (TØI) Oslo, Norway lef@toi.no www.toi.no

  5. Outline • Econometrics is better suited for accident analysis than for economics. Why? • Accident modelling implications • Remedies against overfitting • Structure of the TRULS model

  6. 1. “…although econometrics was originally developed as a toolbox for economic research, it may … be even better suited for accident analysis”. (Fridstrøm 1999a) • In most econometric applications, the “error” term is random only in the sense of being unknown to the analyst. It is epistemically (subjectively) random. • Accident counts, on the other, are ontologically (objectively) random. Their distribution in a perfectly specified model is known: Poisson. • Had the individual accident been anticipated, it would not have happened! It is thus logically unpredictable. We are dealing with the whitest noise in behavioural science. • Thus, accident counts lend themselves to a natural and clear-cut distinction between the causal and the casual: systematic vs. random variation.

  7. The linear probability model systematic (causal) part random part Usually, u is random only in the sense of being unobservable to the analyst. It is epistemically random, like in random utility theory.

  8. 1. “…although econometrics was originally developed as a toolbox for economic research, it may … be even better suited for accident analysis”. (Fridstrøm 1999a) • In most econometric applications, the “error” term is random only in the sense of being unknown to the analyst. It is epistemically (subjectively) random. • Accident counts, on the other, are ontologically (objectively) random. Their distribution in a perfectly specified model is known: Poisson. • Had the individual accident been anticipated, it would not have happened! It is thus logically unpredictable. We are dealing with the whitest noise in behavioural science. • Thus, accident counts lend themselves to a natural and clear-cut distinction between the causal and the casual: systematic vs. random variation.

  9. Random and systematic variation coexist In the accident modelling case, we know that if all systematic variation has been accounted for throughthe xterms, then the y terms are independent Poisson variates. While the u terms are probabilistically independent, the terms are functionally dependent on certain common factors and hence empirically correlated.

  10. 1. “…although econometrics was originally developed as a toolbox for economic research, it may … be even better suited for accident analysis”. (Fridstrøm 1999a) • In most econometric applications, the “error” term is random only in the sense of being unknown to the analyst. It is epistemically (subjectively) random. • Accident counts, on the other, are ontologically (objectively) random. Their distribution in a perfectly specified model is known: Poisson. • Had the individual accident been anticipated, it would not have happened! It is thus logically unpredictable. We are dealing with the whitest noise in behavioural science. • Thus, accident counts lend themselves to a natural and clear-cut distinction between the causal and the casual: systematic vs. random variation.

  11. Eeyore is right ”I’m not saying there won’t be an Accident now, mind you.They’re funny things, Accidents. You never have them till you’re having them.” (A.A. Milne: The House at Pooh Corner)

  12. The (generalized) Poisson distribution In the Poisson distribution, the variance equals the mean. Once we have estimated the mean, we also know the amount of objectively random variation. If the model is not perfectly specified, or if the individual events counted are not probabilistically independent, overdispersion most be expected, i. e. the variance exceeds the mean. Thus, victim counts will typically be overdispersed. It may be preferable to work with accident counts in combination with severity measures capturing victims per accident. To allow - or test - for overdispersion, use the generalized Poisson model, i. e. the negative binomial regression model.

  13. Observed number Poisson distribution95 per cent probability bounds around expected value Expected number

  14. The negative binomial distribution Suppose the Poisson parameter is itself random, and drawn from a gamma distribution with shape parameter (say). In this case the observed number of accidents can be shown to follow a negative binomial distribution with expected value (say) and variance • being the overdispersion parameter. • Two interpretations: • Unobserved heterogeneity (Greenwood & Yule 1920) • True contagion (Eggenberger and Pólya 1923)

  15. Misspecification may show up as overdispersion Suppose one relevant variable has been left out. In this case some systematic variation is indeed contained in the error term:

  16. Is (generalized) Poisson regression the only way to go about? No. The limiting distribution of the Poisson is the normal. Approximation is good already for mean 10 and above.But dependent variable should be log-transformed.Since the variance of a Poisson variable equals its mean, “objective” heteroskedasticity can be accounted for through appropriate weighting. This requires iteration and sometimes cumbersome transformations.Box-Cox regression models are useful, since for many partial relationships, curvature is not known a priori.

  17. For large Poisson counts y, the variance of ln(y) is inversely proportional to the expected value ω. The Box-Tukey constant is needed, since the log of a Poisson variate has infinite variance.

  18. The variance of ln(y+a), where y is Poisson distributed with parameterω. Source: TØI report 457

  19. The IRPOSKML method of estimation: improved error variance approximation for small accident counts ωvalues ranging from 0.000248 to 692 Source: TØI report 457

  20. 2. Implications for accident modelling • Concentrate on substance • Concentrate on exposure - estimate exposure elasticities • Use multiplicative models • Use cumulative severity categories • Avoid autoregressive models

  21. Concentrate on substance Accident models are useful in estimating a variety of policy relevant parameters, including • the marginal external accident cost, • the (marginal) contribution of various road user categories to risk, • the effect of accident countermeasures, and • the importance of behavioral response (risk compensation). The policy relevant, explanatory factors are in the systematic part, not in the random term. There is no need to further explore therandom term, whose properties are already better known than in any other econometric application! Advanced filtering and transformation will drain the juice out of the pattern of co-variation, without adding substantive knowledge. Avoid differencing! Let the levels speak! Spend your resources on identifying systematic, causal factors, and on estimating their curvature and elasticities!

  22. Concentrate on exposure The most important explanatory factor in any accident model is going to be exposure. Give priority to its measurement and estimation! Note that exposure is multidimensional! There are cars, trucks, buses, tramways, motorcycles, bicycles and pedestrians. Their interaction in producing accidents is of prime interest. The elasticity of accident frequency w. r. t. exposure is not necessarily 1. Estimate it! The marginal (external) cost of accidents depends on it. The relationships are not necessarily (log-)linear. Estimate their curvatures!

  23. Risk and accident frequency are non-negative magnitudes A minimum logical requirement is that models do not predict negative accident frequencies or risk. Additive linear regression models are therefore ruled out. Multiplicative models are the canonical form. Risk factors compound. The systematic part of the regression should be decomposable as the product of various factors. Models for non-negative integer-valued variables are a natural choice.

  24. Use cumulative severity categories • Unless severity is defined cumulatively, models may provide counterintuitive effect parameters that are hard to interpret. • Road safety measures may inflate any category except the uppermost. Hence always include the more serious accidents than the ones considered. Fatal injuries Critical injuries Serious injuries Slight injuries Property damage only

  25. Avoid autoregressive models Trying to explain the causal part in period t by means of the white noise in period t-1, t-2, etc. This is obfuscation, not explanation. But: Do not confuse autoregression with autocorrelation. Autocorrelated models are OK.

  26. 3. Remedies against overfitting • Use specialized goodness-of-fit measures • Use casualty subset tests • Splitting the sample – and out-of-sample prediction

  27. The upper bound on explanatory power is computable • On account of the Poisson assumption, it is possible, for a given accident data set, to calculate the normal amount of random variation and hence also the maximal amount of explainable, systematic variation.Using this information, one may calculate goodness-of-fit measures for the systematic variation of interest, thus comparing the explained to the explainable.See AA&P vol 27, pp 1-20 (1995)

  28. Specialized goodness-of-fit • Consider the well-known coefficient of determination • An observable upper bound on the coefficient of determination is given by • Compute the coefficient of determination for systematic variation

  29. Randomness accounts for a large part of variation in smaller accident counts Source: AA&P 27 (1):1-20 (1995)

  30. Victim counts are overdispersed Source: AA&P 27 (1):1-20 (1995)

  31. The casualty subset test The affirmative casualty subset test: For any explanatory factor operating through its presumed effect on a particular subset of casualties, the effect should be extra strong as applied to this subset. Ex.: seat belts and car occupants. The complement casualty subset test: For any explanatory factor not affecting a particular subset of casualties, the effect should be zero as applied to this subset. Ex.: seat belts and pedestrians. The converse casualty subset test: For any explanatory having an opposite effect on a particular subset of casualties, the effect should be sign-reversed as applied to this subset. Ex.: seat belts and seat belt users killed or injured.

  32. Out-of-sample prediction A model can provide a perfect fit inside its own sample, and yet quite bad out-of-sample predictions. The proof of the pudding is in the eating! Source: Partyka (1991) (AA&P 23:423-430), quoted by Elvik (2007)

  33. 4. The TRULS model for Norway – a member of the DRAG family Recursive system of equations at the county and month level: 19 counties x 264 months (22 years) = 5016 observations. Observations cover 1973-94. The model has not been updated. Equations: • Car ownership • Exposure: light and heavy vehicle road use, MCs, and public transport • Seat belt use • Injury accident frequency • Severity: fatalities, dangerously injured, severely injured • Various casualty subset equations: • single vs multiple vehicle crashes; • heavy vehicle crashes; • car occupant, bicyclist, and pedestrian victims; • (non-)seat belt users injured

  34. The TRULS model Injury accident frequency: Severity:

  35. The TRULS model for Norway Estimated elasticities w r t exposure, by severity. Source: TØI report 457

  36. The TRULS model for Norway Estimated elasticities w r t exposure, by road user category. Source: TØI report 457

  37. The TRULS model for Norway: relative injury accident risk as a function of traffic density. 5016 sample points (19 counties x 264 months).

  38. The TRULS model for Norway: relative accident elasticities with respect to road use, as a function of traffic density. 5016 sample points (19 counties x 264 months).

  39. The TRULS model for Norway: relative injury accident frequency as a function of aggregate seat belt use. 5016 sample points (19 counties x 264 months).

  40. According to TRULS, heavy vehicles are 3.8 times (=1.321/0.345) more dangerous than light ones. Light vehicle road users generate a positive external accident costs only if their own share of the accident cost is less than 34 %.

  41. Thank you for listening! Read more: • TØI report 457/1999 • Acc. Anal. & Prev. 27 (1):1-20 (1995)

  42. The Poisson distribution There are compelling theoretical and empirical reasons to assume that accident counts are Poisson distributed. The Poisson is a one parameter distribution: When we know the mean, we also know how much variance to expect around it! The coefficient of variation decreases with the mean:

  43. Generalized Poisson variates • Integer valued: 0, 1, 2, … • Zero occurrences OK. • Poisson invariance under summation • Non-negative outcome and positive expected value. Suggests multiplicative structure of cofactors/independent variables. • Estimable through maximum likelihood (ML) methods. • ML implicitly takes account of heteroskedasticity

  44. Probabilistic theories are complete Einstein: “He [God] does not play dice.” Salmon (1984): Certain laws are ”irreducibly statistical”, i. e. they include an inevitable, objectively random component. Single events may occur at random intervals, but with an almost constant overall frequency in the long run. Such laws are common in particle physics, but rare in behavioral science. Although the single event is all but impossible to predict, the collection of such events may very well behave in a perfectly predictable way, amenable to description by means of precise mathematical-statistical relationships. Ex.: radioactive decay (C14 method), die tossing, road accidents.

  45. 95 per cent, overdispersed probability interval around trend-fitted annual road fatalities in Norway. Source: Elvik (2005), TØI report 792

  46. The law of rare events Consider a time-varying random variable Y(t) such that Then i. e., the number of events occurring during any interval of length t (say) has a Poissondistribution with mean

More Related