180 likes | 289 Views
Explore the challenges faced in handling continuous variables in Generalized Additive Models (GAMs) and learn about various options like categorization, polynomial transformations, and non-parametric functional smoothers to enhance model accuracy and flexibility.
E N D
Generalized Additive Models Keith D. Holler September 19, 2005
GLM’s – The Challenge • What to do with continuous variables? • Eg. Age, credit score, amount of insurance • Options • Categorize – but how? • Equal volume, Tree, judgment • Appendix H, “A Practioner’s Guide to GLMs” by Duncan et al • Treat as polynomial • The Weierstrass Approximation Theorem • Eg Mileage (2 miles)^4 = 16 (25 miles)^4 = 390,625 • Look at categorical estimates, transform, rerun • Newage variable = age^3 if age < 20 + age^2 if age < 80 + minimum (age, 80) • All forms must be decided BEFORE model is run • Obviously, no clear winner!
Generalized Additive Models - GAMS • GLMs are special case of GAMs • Eg LN(E[PP]) = Intercept + f1(age) + f2(gender) + f3(symbol) + f4(marital) • The functions f1,f2,f3,f4 can be anything • GLM - Categorical, polynomial, transforms • Non-parametric functional smoothers • Decision trees • Balance degrees of freedom, amount of data, and functional form better
Smoothers – Partial List • Locally weighted running line smoother (LOESS) • Regression splines • Cubic smoothing splines • Monotonic splines • B-splines • Kernel smoothers • Running medians, means, lines • GLM – categories or polynomials • Decision Trees • Many can be extended to multiple dimensions
GAM – Keys • Backfitting allows reduction of dimension • Residual Z = LN(E[PP]) – intercept – f1(age) – f2(gender) – f4(marital) • Fit Z = f3(symbol) • Now a 2-dimensional problem “Y vs X” • Data drives the shape • Not determined apriori • Use of cross validation to find smoothing parameter • “Local” – many of the smoothers use only data points close to the point being predicted, instead of all.
Example – SAS Code proc gam data=all; class gender marital2; model clclmonz = param(gender marital2) spline(age2,df=4) spline(symbol,df=3) / dist=Poisson; output out=estall p; run;
Smoothing Spline • Error Criteria ∑ {Yi – g(ti) } ² + λ∫ { g” (t)} ² dt • λ is smoothing parameter • Reference: Nonparametric Regression and Generalized Linear Models, Green and Silverman
Example – Cross Validation proc gam data=all; class gender marital2; model clclmonz = param(gender marital2) spline(age2) spline(symbol) / method=GCV dist=Poisson; output out=estGCV p; run; Results in degrees of freedom of 17 and 14.
Miscellaneous • Parameter Estimates – 1 for each value • SPLUS • References • SAS Proc Gam • Generalized Additive Models, Hastie and Tibshirani
Q & A Keith D. Holler PhD, FCAS, ASA, ARM Personal Lines Research Department St. Paul Travelers kdholler@travelers.com (860) 277 – 4808 Research paper in progress for Ratemaking call