140 likes | 321 Views
Linear Methods for Regression. Lecture Notes for CMPUT 466/551 Nilanjan Ray. Assumption: Linear Regression Function. Model assumption: Output Y is linear in the inputs X =( X 1 , X 2 , X 3 ,…, X p ). Predict the output by:. Vector notation, 1 included in X. Where,.
E N D
Linear Methods for Regression Lecture Notes for CMPUT 466/551 Nilanjan Ray
Assumption: Linear Regression Function Model assumption: Output Y is linear in the inputs X=(X1, X2, X3,…, Xp) Predict the output by: Vector notation, 1 included in X Where, Also known as multiple-regression when p>1
Least Square Solution Residual sum of squares: In matrix-vector notation: residual Vector differentiation: Solution: Known asleast square solution For a new input The regression output is
Bias-Variance Decomposition • has zero expectation same variance uncorrelated Model: where Linear Estimator: Variance: Bias: Unbiased estimator! Ex. Show the last step Decomposition of EPE: Variance= 2(p/N) Sq. bias=0 Irreducible error= 2
Gauss-Markov Theorem Gauss-Markov Theorem: least square estimate has the minimum variance among all linear unbiased estimators Interpretation: The estimator found by least squares is linear in y We have noticed that this estimator is unbiased, i.e., If we find any other unbiased estimator g(x0) of f(x0) that is linear in y too, i.e., and then Question: Is the LS the best estimator for the given linear additive model?
Subset Selection • LS solution often has large variance (remember that variance is proportional to the number of inputs p, i.e., model complexity) • If we decrease the number of input variables p, we can decrease the variance, however we then sacrifice the zero bias • If this trade-off decreases test error, the solution can be accepted • This reasoning leads to subset selection, i.e., select a subset from the p inputs for the regression computation • Subset selection has another advantage– easy and focused interpretation of the input variables on the output
Subset Selection… Can we determine which j s are insignificant? Yes, we can by statistical hypothesis testing! However, we need a model assumption: is zero mean Gaussian with standard deviation
Subset Selection: Statistical Significance Test The linear model with additive Gaussian noise has the following properties: Ex. Show this. So we can form a standardized coefficient or Z-score test for each coefficient: where and vj is the jth diagonal element of (XTX)-1 Hypothesis testing principle says that a large value of Z-score should retain The coefficient, a small value should discard the coefficient How large/small – depends on the significance level
Case Study: Prostate Cancer Output = log prostate-specific antigen Input = ( log cancer volume, log prostate weight, age, log of benign prostatic hyperplacia, seminal vesicle invasion, log of capsular penetration, Gleason score, % of Gleason score 4 or 5) Goal: (1) predict the output given a novel input (2) Interpret the influence of the inputs on the output
Case Study… Scatter plot Hard to interpret which ones are most influencing Also we want to find out how the inputs jointly influence the output
Subset Selection on Prostate Cancer Data Scores with magnitude greater than 2 indicate significant variables at 5% significance level
Coefficient Shrinkage: Ridge Regression Method Non-negative penalty One computational advantage is that the matrix is always invertible If L2 norm is replaced by L1 norm, the corresponding regression is called LASSO (see [HTF])
Ridge Regression… coefficient Decreasing One way to determine is cross validation – we’ll learn about it later