introduction to generalized linear models l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Generalized Linear Models PowerPoint Presentation
Download Presentation
Introduction to Generalized Linear Models

Loading in 2 Seconds...

play fullscreen
1 / 85

Introduction to Generalized Linear Models - PowerPoint PPT Presentation


  • 490 Views
  • Uploaded on

Introduction to Generalized Linear Models. Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com September 18, 2005. Objectives. Gentle introduction to Linear Models and Generalized Linear Models Illustrate some simple applications

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Introduction to Generalized Linear Models


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com September 18, 2005

    2. Objectives • Gentle introduction to Linear Models and Generalized Linear Models • Illustrate some simple applications • Show examples in commonly available software • Address some practical issues

    3. Predictive Modeling Family

    4. A Brief Introduction to Regression • Fits line that minimizes squared deviation between actual and fitted values

    5. Simple Formula for Fitting Line

    6. Excel Does Regression • Install Data Analysis Tool Pak (Add In) that comes wit Excel • Click Tools, Data Analysis, Regression

    7. Analysis of Variance Table

    8. Goodness of Fit Statistics • R2: (SS Regression/SS Total) • percentage of variance explained • F statistic: (MS Regression/MS Residual) • significance of regression • T statistics: Uses SE of coefficient to determine if it is significant • significance of coefficients • It is customary to drop variable if coefficient not significant • Note SS = Sum squared errors

    9. Output of Excel Regression Procedure

    10. Other Diagnostics: Residual Plot • Points should scatter randomly around zero • If not, a straight line probably is not be appropriate

    11. Other Diagnostics: Normal Plot • Plot should be a straight line • Otherwise residuals not from normal distribution

    12. Non-Linear Relationships • The model fit was of the form: • Severity = a + b*Year • A more common trend model is: • SeverityYear=SeverityYear0*(1+t)(Year-Year0) • T is the trend rate • This is an exponential trend model • Cannot fit it with a line

    13. Exponential Trend – Cont. • R2 declines and Residuals indicate poor fit

    14. A More Complex Model • Use more than one variable in model (Econometric Model) • In this case we use a medical cost index, the consumer price index and employment data (number employed, unemployment rate, change in number employed, change in UEP rate to predict workers compensation severity

    15. Multivariable Regression

    16. One Approach: Regression With All Variables • Many variables not significant • Over-parameterization issue • How get best fitting most parsimonious model?

    17. Multiple Regression Statistics

    18. Stepwise Regression • Partial correlation • Correlation of dependent variable with predictor after all other variables are in model • F – contribution • Amount of change in F-statistic when variable is added to model

    19. Stepwise regression-kinds • Forward stepwise • Start with best one variable regression and add • Backward stepwise • Start with full regression and delete variables • Exhaustive

    20. Stepwise regression for Severity Data

    21. Stepwise Regression-Excluded Variables

    22. Econometric Model Assessment • Standardized residuals more evenly spread around the zero line • R2 is .91 vs .52 of simple trend regression

    23. Correlation of Predictor Variables: Multicollinearity

    24. Multicollinearity • Predictor variables are assumed uncorrelated • Assess with correlation matrix

    25. Remedies for Multicollinearity • Drop one or more of the highly correlated variables • Use Factor analysis or Principle components to produce a new variable which is a weighted average of the correlated variables • Use stepwise regression to select variables to include

    26. Degrees of Freedom • Related to number of observations • One rule of thumb: subtract the number of parameters estimated from the number of observations • The greater the number of parameters estimated, the lower the number of degrees of freedom

    27. Degrees of Freedom • “Degrees of freedom for a particular sum of squares is the smallest number of terms we need to know in order to find the remaining terms and thereby compute the sum” • Iverson and Norpoth, Analysis of Variance • We want to keep the df as large as possible to avoid overfitting • This concept becomes particularly important with complex data mining models

    28. Regression Output cont. • Standardized residuals more evenly spread around the zero line – but pattern still present • R2 is .84 vs .52 of simple trend regression • We might want other variables in model (i.e, unemployment rate), but at some point overfitting becomes a problem

    29. Tail Development Factors: Another Regression Application • Typically involve non-linear functions: • Inverse Power Curve: • Hoerel Curve: • Probability distribution such as Gamma, Lognormal

    30. Non-Linear Regression • Use it to fit a non-linear function where transformation to linear model will not work • Example • LDF = (Cumulative % Incurred at t+1)/ (Cumulative % Incurred a t) Assume gamma cumulative distribution

    31. Example: Inverse Power Curve • Can use transformation of variables to fit simplified model: LDF=1+k/ta • ln(LDF-1) =a+b*ln(1/t) • Use nonlinear regression to solve for a and c • Uses numerical algorithms, such as gradient descent to solve for parameters. • Most statistics packages let you do this

    32. Nonlinear Regression: Grid Search Method • Try out a number of different values for parameters and pick the ones which minimize a goodness of fit statistic • You can use the Data Table capability of Excel to do this • Use regression functions linest and intercept to get k and a • Try out different values for c until you find the best one

    33. Fitting non-linear function

    34. Using Data Tables in Excel

    35. Use Model to Compute the Tail

    36. Claim Count Triangle Model • Chain ladder is common approach

    37. Claim Count Development • Another approach: additive model • This model is the same as a one factor ANOVA

    38. ANOVA Model for Development

    39. ANOVA: Two Groups • With only two groups we test the significance of the difference between their means • Many years ago in a statistics course we learned how to use a t-test for this

    40. ANOVA: More than 2 Groups

    41. Correlation Measure: Eta

    42. ANOVA Model for Development

    43. Regression With Dummy Variables • Let Devage24=1 if development age = 24 months, 0 otherwise • Let Devage36=1 if development age = 36 months, 0 otherwise • Need one less dummy variable than number of ages

    44. Regression with Dummy Variables: Design Matrix

    45. Equivalent Model to ANOVA

    46. Apply Logarithmic Transformation • It is reasonable to believe that variance is proportional to expected value • Claims can only have positive values • If we log the claim values, can’t get a negative • Regress log(Claims+.001) on dummy variables or do ANOVA on logged data

    47. Log Regression

    48. Poisson Regression • Log Regression assumption: errors on log scale are from normal distribution. • But these are claims – Poisson assumption might be reasonable • Poisson and Normal from more general class of distributions: exponential family of distributions

    49. “Natural” Form of the Exponential Family

    50. Specific Members of the Exponential Family • Normal (Gaussian) • Poisson • Binomial • Negative Binomial • Gamma • Inverse Gaussian