Chapter 6 Regression Algorithms in Data Mining

Chapter 6Regression Algorithms in Data Mining Fit data Time-series data: Forecast Other data: Predict

Contents • Describes OLS (ordinary least square) regression and Logistic regression • Describes linear discriminant analysis and centroid discriminant analysis • Demonstrates techniques on small data sets • Reviews the real applications of each model • Shows the application of models to larger data sets

Use in Data Mining • Telecommunication Industry, turnover (churn) • One of major analytic models for classification problem. • Linear regression • The standard –ordinaryleast squares regression • Can use for discriminant analysis • Can apply stepwise regression • Nonlinear regression • More complex (but less reliable) data fitting • Logistic regression • When data are categorical (usually binary)

OLS Model

OLS Regression • Uses intercept and slope coefficients (b) to minimize squared error terms over all i observations • Fits the data with a linear model • Time-series data: • Observations over past periods • Best fit line (in terms of minimizing sum of squared errors)

Regression Output R2 : 0.987 Intercept: 0.642 t=0.286 P=0.776 Week: 5.086 t=53.27 P=0 Requests = 0.642 + 5.086*Week

Example R2 SSE SST

Example

Agraph of the time-series model

Time-Series Forecast

Regression Tests • FIT: • SSE – sum of squared errors • Synonym: SSR – sum of squared residuals • R2–proportion explained by model • Adjusted R2– adjusts calculation to penalize for number of independent variables • Significance • F-test - test of overall model significance • t-test - test of significant difference between model coefficient & zero • P – probability that the coefficient is zero • (or at least the other side of zero from the coefficient) See page. 103

Regression Model Tests • SSE (sum of squared errors) • For each observation, subtract model value from observed, square difference, total over all observations • By itself means nothing • Can compare across models (lower is better) • Can use to evaluate proportion of variance in data explained by model • R2 • Ratio of explained squared dependent variable values (MSR) to sum of squares (SST) • SST = MSR plus SSE • 0 ≤ R2 ≤ 1 See page. 104

Multiple Regression • Can include more than one independent variable • Trade-off: • Too many variables – many spurious, overlapping information • Too few variables – miss important content • Adding variables will always increase R2 • Adjusted R2penalizes for additional independent variables

Example: Hiring Data • Dependent Variable – Sales • Independent Variables: • Years of Education • College GPA • Age • Gender • College Degree See page. 104-105

Regression Model Sales = 269025 -17148*YrsEd P = 0.175 -7172*GPA P = 0.812 +4331*Age P = 0.116 -23581*Male P = 0.266 +31001*Degree P = 0.450 R2 = 0.252 Adj R2 = -0.015 Weak model, no significant at 0.10

Improved Regression Model Sales = 173284 - 9991*YrsEd P = 0.098* +3537*Age P = 0.141 -18730*Male P = 0.328 R2 = 0.218 Adj R2 = 0.070

Logistic Regression • Data often ordinal or nominal • Regression based on continuous numbers not appropriate • Need dummy variables • Binary – either are or are not • LOGISTIC REGRESSION (probability of either 1 or 0) • Two or more categories • DISCRIMINANT ANALYSIS (perform regression for each outcome; pick one that fit’s best)

Logistic Regression • For dependent variables that are nominal or ordinal • Probability of acceptance of • case i to class j • Sigmoidal function • (in English, an S curve from 0 to 1)

Insurance Claim Model Fraud = 81.824 -2.778 * Age P = 0.789 -75.893 * Male P = 0.758 + 0.017 * Claim P = 0.757 -36.648 * Tickets P = 0.824 + 6.914 * Prior P = 0.935 -29.362 * Atty Smith P = 0.776 Can get probability by running score through logistic formula See page. 107~109

Linear Discriminant Analysis • Group objects into predetermined set of outcome classes • Regression one means of performing discriminant analysis • 2 groups: find cutoff for regression score • More than 2 groups: multiple cutoffs

Centroid Method (NOT regression) • Binary data • Divide training set into two groups by binary outcome • Standardize data to remove scales • Identify means for each independent variable by group (the CENTROID) • Calculate distance function

Fraud Data

Standardized & Sorted Fraud Data

Distance Calculations

Discriminant Analysis with RegressionStandardized data, Binary outcomes Intercept 0.430 P = 0.670 Age -0.421 P = 0.671 Gender 0.333 P = 0.733 Claim -0.648 P = 0.469 Tickets 0.584 P = 0.566 Prior Claims -1.091 P = 0.399 Attorney 0.573 P = 0.607 R2 = 0.804 Cutoff average of group averages: 0.429

Case: Stepwise Regression • Stepwise Regression • Automatic selection of independent variables • Look at F scores of simple regressions • Add variable with greatest F statistic • Check partial F scores for adding each variable not in model • Delete variables no longer significant • If no external variables significant, quit • Considered inferior to selection of variables by experts

Credit Card Bankruptcy PredictionFoster & Stine (2004), Journal of the American Statistical Association • Data on 244,000 credit card accounts • 12-month period • 1 percent default • Cost of granting loan that defaults almost $5,000 • Cost of denying loan that would have paid about $50

Data Treatment • Divided observations into 5 groups • Used one for training • Any smaller would have problems due to insufficient default cases • Used 80% of data for detailed testing • Regression performed better than C5 model • Even though C5 used costs, regression didn’t

Summary • Regression a basic classical model • Many forms • Logistic regression very useful in data mining • Often have binary outcomes • Also can use on categorical data • Can use for discriminant analysis • To classify

Chapter 6 Regression Algorithms in Data Mining

Chapter 6 Regression Algorithms in Data Mining

Presentation Transcript

Regression for Data Mining

Data Mining Algorithms

Data Mining Chapter 4 Algorithms: The Basic Methods

DATA MINING LECTURE 6

Data Mining – Algorithms: Linear Models

Machine Learning and Data Mining Linear regression

Data Mining Algorithms for Recommendation Systems

Chapter 6: Regression Diagnostics

ICS 278: Data Mining Lecture 7: Regression Algorithms

Score Function for Data Mining Algorithms

CS 277: Data Mining Regression Algorithms

Chapter 6 Simple Regression

Data Mining Algorithms for Recommendation Systems

Data Mining – Algorithms: Decision Trees - ID3

Chapter 6 Regression I Introduction to Regression

ICS 278: Data Mining Lectures 7, 8: Regression Algorithms

DATA MINING: Algorithms, Applications and Beyond

Chapter 6 Regression I Introduction to Regression

Data Mining – Algorithms: Decision Trees - ID3

Data Mining: Concepts and Techniques — Chapter 6 —

Data Mining Algorithms