72 Views

Download Presentation
## Simple Linear Regression

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Simple Linear Regression**With Thanks to My Students in AMS572 – Data Analysis**1. Introduction**Example: Brad Pitt: 1.83m Angelina Jolie: 1.70m George Bush :1.81m Laura Bush: ? David Beckham: 1.83m Victoria Beckham: 1.68m ● To predict height of the wife in a couple, based on the husband’s height Response (out come or dependent) variable (Y): height of the wife Predictor (explanatory or independent) variable (X): height of the husband**Regression analysis:**●regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variable. ●when there is just one predictor variable, we will use simple linear regression. When there are two or more predictor variables, we use multiple linear regression. ● The earliest form of linear regression was the method of least squares, which was published by Legendre in 1805, and by Gauss in 1809. ● The method was extended by Francis Galton in the 19th century to describe a biological phenomenon. ● This work was extended by Karl Pearson and Udny Yule to a more general statistical context around 20th century. ● when it is not clear which variable represents a response and which is a predictor, correlation analysis is used to study the strength of the relationship History:**A probabilistic model**• We denote the n observed values of the predictor variable x as • We denote the corresponding observed values of the response variable Y as**Notations of the simple linear Regression**- Observed value of the random variable Yi depends on xi - random error with unknown mean of Yi True Regression Line Unknown Slope Unknown Intercept**4 BASIC ASSUMPTIONS – for statistical inference**Linear function of the predictor variable Have a common variance, Same for all values of x. Normally distributed Independent**Comments:**1. Linear not in x But in the parameters and Example: linear, logx = x* 2. Predictor variable is not set as predetermined fixed values, is random along with Y. The model can be considered as a conditional model Example: Height and Weight of the children. Height (X) – given Weight (Y) – predict Conditional expectation of Y given X = x**2. Fitting the Simple Linear Regression Model**2.1 Least Squares (LS) Fit**Example 10.1 (Tires Tread Wearvs. Mileage: Scatter Plot.**From: Statistics and Data Analysis; Tamhane and Dunlop; Prentice Hall. )**The “best” fitting straight line in the sense of**minimizing Q: LS estimate One way to find the LS estimate and Setting these partial derivatives equal to zero and simplifying, we get**To simplify, we introduce**• The resulting equation is known as the least squares regression line, which is an estimate of the true regression line.**Example 10.2 (Tire Tread vs. Mileage: LS Line Fit)**Find the equation of the line for the tire tread wear data from Table10.1,we have and n=9.From these we calculate**The slope and intercept estimates are**Therefore, the equation of the LS line is Conclusion: there is a loss of 7.281 mils in the tire groove depth for every 1000 miles of driving. Given a particular We can find This means that the mean groove depth for all tires driven for 25,000miles is estimated to be 178.62 miles.**2.2 Goodness of Fit of the LS Line**• Coefficient of Determination and Correlation • The residuals: are used to evaluate the goodness of fit of the LS line.**We define:**Note: total sum of squares (SST) Regression sum of squares (SSR) Error sum of squares (SSE) is called the coefficient of determination**Example 10.3 (Tire Tread Wear vs. Mileage: Coefficient of**Determination and Correlation • For the tire tread wear data, calculate using the result s from example 10.2 We have • Next calculate • Therefore • The Pearson correlation is where the sign of r follows from the sign of since 95.3% of the variation in tread wear is accounted for by linear regression on mileage, the relationship between the two is strongly linear with a negative slope.**The Maximum Likelihood Estimators (MLE)**• Consider the linear model: where is drawn from a normal population with mean 0 and standard deviation σ, the likelihood function for Y is: • Thus, the log-likelihood for the data is:**The MLE Estimators**• Solving • We obtain the MLEs of the three unknown model parameters • The MLEs of the model parameters a and b are the same as the LSEs – both unbiased • The MLE of the error variance, however, is biased:**2.3 An Unbiased Estimator of s2**An unbiased estimate ofis given by Example 10.4(Tire Tread Wear Vs. Mileage: Estimate of Find the estimate of for the tread wear data using the results from Example 10.3 We have SSE=2351.3 and n-2=7,therefore Which has 7 d.f. The estimate of is miles.**3. Statistical Inference on b0 and b1**Under the normal error assumption * Point estimators: * Sampling distributions of and : For mathematical derivations, please refer to the Tamhane and Dunlop text book, P331.**Statistical Inference on b0 and b1 , Con’t*** Pivotal Quantities (P.Q.’s): * Confidence Intervals (CI’s):**Statistical Inference on b0 and b1 , Con’t*** Hypothesis tests: -- Test statistics: -- At the significance level , we reject in favor of if and only if (iff) --The first test is used to show whether there is a linear relationship between x and y**Analysis of Variance (ANOVA), Con’t**Mean Square: -- a sum of squares divided by its d.f.**Analysis of Variance (ANOVA)**ANOVA Table Example:**4. Regression Diagnostics4.1 Checking for Model Assumptions**• Checking for Linearity • Checking for Constant Variance • Checking for Normality • Checking for Independence**Checking for Linearity**Xi =Mileage Y=β0+ β1x Yi =Groove Depth ^ ^ ^ ^ Y=β0+ β1x Yi =fitted value ^ ei =residual Residual = ei = Yi- Yi**Checking for Constant Variance**Var(Y) is not constant. A sample residual plots when Var(Y) is constant.**Does not apply for Simple Linear Regression Model**Only apply for time series data Checking for Independence**4.2 Checking for Outliers & Influential Observations**• What is OUTLIER • Why checking for outliers is important • Mathematical definition • How to deal with them**4.2-A. Intro**Recall Box and Whiskers Plot (Chapter 4 of T&D) • Where (mild) OUTLIER is defined as any observations that lies outside of Q1-(1.5*IQR) and Q3+(1.5*IQR) (Interquartile range, IQR = Q3 − Q1) • (Extreme) OUTLIER as that lies outside of Q1-(3*IQR) and Q3+(3*IQR) • Observation "far away" from the rest of the data**4.2-B. Why are outliers a problem?**• May indicate a sample peculiarity or a data entry error or other problem ; • Regression coefficients estimated that minimize the Sum of Squares for Error (SSE) are very sensitive to outliers >>Bias or distortion of estimates; • Any statistical test based on sample means and variances can be distorted In the presence of outliers >>Distortion of p-values; • Faulty conclusions. Example: ( Estimators not sensitive to outliers are said to be robust)**4.2-C. Mathematical Definition**• Outlier The standardized residual is given by • If |ei*|>2, then the corresponding observation may be regarded an outlier. • Example: (Tire Tread Wear vs. Mileage) • STUDENTIZED RESIDUAL: a type of standardized residual calculated with the current observation deleted from the analysis. • The LS fit can be excessively influenced by observation that is not necessarily an outlier as defined above.**eg.1 with without**eg.2 scatter plot residual plot 4.2-C. Mathematical Definition • Influential Observation Observation with extreme x-value, y-value, or both. • On average hii is (k+1)/n, regard any hii>2(k+1)/n as high leverage; • If xi deviates greatly from mean x, then hii is large; • Standardized residual will be large for a high leverage observation; • Influence can be thought of as the product of leverage and outlierness. • Example: (Observation is influential/ high leverage, but not an outlier)**4.2-C. SAS code of the tire example**SAS code Data tire; Input x y; Datalines; 0 394.33 4 329.50 … 32 150.33; Run; procregdata=tire; model y=x; outputout=resid rstudent=r h=lev cookd=cd dffits=dffit; Run; procprintdata=resid; where abs(r)>=2 or lev>(4/9) or cd>(4/9) or abs(dffit)>(2*sqrt(1/9)); run;**4.2-C. SAS output of the tire example**SAS output**4.2-D. How to deal with Outliers & Influential Observations**• Investigate (Data errors? Rare events? Can be corrected?) • Ways to accommodate outliers • Non Parametric Methods (robust to outliers) • Data Transformations • Deletion (or report model results both with and without the outliers or influential observations to see how much they change)**4.3 Data Transformations**Reason • To achieve linearity • To achieve homogeneity of variance • To achieve normality or symmetry about the regression equation**Types of Transformation**• Linearzing Transformation transformation of a response variable, or predicted variable, or both, which produces an approximate linear relationship between variables. • Variance Stabilizing Transformation make transformation if the constant variance assumption is violated**Linearizing Transformation**• Use mathematical operation, e.g. square root, power, log, exponential, etc. • Only one variable needs to be transformed in the simple linear regression. Which one? Predictor or Response? Why?**e.g. We take a log transformation on Y = a exp (-bx) <=>**log Y = log a - b x**Variance Stabilizing Transformation**Delta method :Two terms Taylor-series approximations Var( h(Y)) ≈ [h(m)]2 g2 (m) whereVar(Y) = g2(m), E(Y) = m • set [h’(m)]2 g2 (m) = 1 • h’(m) = • h(m) = h(y) = e.g. Var(Y) = c2 m2 , where c > 0, g(m) = cm ↔ g(y) = cy h(y) = = = Therefore it is the logarithmic transformation**5. Correlation Analysis**• Pearson Product Moment Correlation: a measurement of how closely two variables share a linear relationship. • Useful when it is not possible to determine which variable is the predictor and which is the response. • Health vs wealth. Which is predictor? Which is response?**Statistical Inference on the Correlation Coefficient ρ**• We can derive a test on the correlation coefficient in the same way that we have been doing in class. • Assumptions • X, Y are from the bivariate normal distribution • Start with point estimator • r: sample correlation coefficient: estimator of the population correlation coefficient ρ • Get the pivotal quantity • The distribution of r is quite complicated • T0: test statistic for ρ = 0 • Do we know everything about the p.q.? • Yes: T ~ tn-2 under H0 : ρ=0**Bivariate Normal Distribution**• pdf: • Properties • μ1, μ2 means for X, Y • σ12, σ22 variances for X, Y • ρ the correlation coeffbetween X, Y**Derivation of T0**Are these equivalent? • Therefore, we can use t as a statistic for testing against the null hypothesis H0: β1=0 • Equivalently, we can test against H0: ρ=0 Substitute After a few steps we can see that these two t-tests are indeed equivalent**Exact Statistical Inference on ρ**• Example • A researcher wants to determine if two test instruments give similar results. The two test instruments are administered to a sample of 15 students. The correlation coefficient between the two sets of scores is found to be 0.7. Is this correlation statistically significant at the .01 level? • H0 : ρ=0 , Ha : ρ≠0 • for α = .01, 3.534 = t0 > t13, .005 = 3.012 • ▲ Reject H0 • Test • H0 : ρ=0 , Ha : ρ≠0 • Test statistic: • Reject H0 iff