1 / 29

Models: Do You Trust Them?

Models: Do You Trust Them?. 2003 CAS Annual Meeting Louise Francis, FCAS, MAAA Louise_Francis@msn.com Francis Analytics and Actuarial Data Mining, Inc. Overview. Data Quality Data Cleaning Software Errors Model Assumptions

norm
Download Presentation

Models: Do You Trust Them?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Models: Do You Trust Them? 2003 CAS Annual Meeting Louise Francis, FCAS, MAAA Louise_Francis@msn.com Francis Analytics and Actuarial Data Mining, Inc.

  2. Overview • Data Quality • Data Cleaning • Software Errors • Model Assumptions • Questions About Key Assumptions Underlying Popular Models in Finance • Option Pricing Theory • Value at Risk • CAPM

  3. Data Mining Models • Advanced modeling techniques applied to large data bases • Many records • Many variables • Some uses • Credit scoring • Fraud detection • Pricing

  4. Data Issues • “Misplaced faith in black boxes: Data Mining is sometimes perceived as a black box, where you feed the data in and interesting results and patterns emerge. Such an approach is particularly misleading when no prior knowledge or experience is used to validate the results of the mining exercise” • Exploratory Data Mining and Data Cleaning, by Dasu and Johnson

  5. Data Exploration and Cleaning • The overwhelming majority of the effort in data modeling is expended on understanding and cleaning data • Generally 85% or more of the effort is spent on data issues • This gets the modeler to the point of applying a modeling technique

  6. Dirty Data • A fact of life for actuaries • Even more of a problem when working with large complex databases • The information for many variables that are not used to produce key financial numbers are inaccurately or incompletely recorded

  7. Examples of Data Problems • Examples are based on actual problems encountered in Data Mining projects • Examples use simulated data

  8. Dirty Data – Incomplete Data

  9. Dirty Data: ErrorsClaim Number vs. Report Date

  10. Detecting Unusual Data: Box and Whisker Plot of Workers’ Compensation Payments

  11. Detecting Unusual Data: Histogram

  12. Detecting Unusual Data: Descriptive Statistics

  13. Frequency of Unusual Observations

  14. Data Challenges • Heterogeneity and Diversity of Data • Join Keys • Scale • Metadata

  15. The Fraud Study Data • 1993 AIB closed PIP claims • Dependent Variables • Suspicion Score • Expert assessment of liklihood of fraud or abuse • Predictor Variables • Red flag indicators • Claim file variables • Errors were introduced into data for two variables, suspicion score and claimant age

  16. Data Cubes: Pivot Table Example

  17. Data Spheres • Applied to numeric data • Can apply to a number of variables simultaneously to detect outliers • Compute standardized value for each variable, yi • Compute Mahalanobis distance:

  18. Data Spheres • More typical values on variables will fall at the center of the data sphere • Less typical values and outliers will be in outer layers • Can look at which variables most influence the Mahalanobis distance

  19. Distribution of Age by Data Sphere Layer

  20. Distribution of Suspicion Score by Data Sphere Layer

  21. Spreadsheet Errors • A large percentage of spreadsheets contain errors. One study found errors in 86% of spreadsheets • From Raymond Panko “What We know About Spreadsheet Errors” • Methods for finding and correcting errors are fairly well developed for programming in computer languages • Such methods are much less frequently applied when the model is in a spreadsheet

  22. Questioning Model Assumptions • Option Pricing Theory

  23. Option Pricing Theory • Option Pricing Formula widely used in finance in pricing options and other derivatives • The formula assumes asset distributions are normal or lognormal • Evidence that asset return data does not follow the normal distribution is widely available • 1976 Fama paper in Journal of the American Statistical Association

  24. Normal Distribution Assumption • The normality assumption is common in other finance application • Value at risk • CAPM

  25. Test of Normal Distribution Assumption

  26. Test of Normal Distribution Assumption

  27. Test of Normal Distribution Assumption

  28. Consequences of Assuming Normality • The frequency of extreme events is underestimated – often by a lot • Example: Long Term Capital • “Theoretically, the odds against a loss such as August’s had been prohibitive, such a debacle was, according to mathematicians, an event so freakish as to be unlikely to occur even once over the entire life of the universe and even over numerous repetitions of the universe” • When Genius Failed by Roger Lowenstein, p. 159

More Related