Regression: (1) Simple Linear Regression

Regression:(1) Simple Linear Regression Hal Whitehead BIOL4062 / 5062

Regression • Purposes of regression • Simple linear regression • Formula • Assumptions • If assumptions hold, what can we do? • Testing assumptions • When assumptions do not hold

Regression One Dependent Variable Y Independent Variables X1,X2,X3,...

Purposes of Regression 1. Relationship between Y and X's 2. Quantitative prediction of Y 3. Relationship between Y and X controlling for C 4. Which of X's are most important? 5. Best mathematical model 6. Compare regression relationships: Y1 on X, Y2 on X 7. Assess interactive effects of X's

Simple regression: one X • Multiple regression: two or more X's

Simple linear regression Y = β0 + β1X + Error

Assumptions of simple linear regression 1. Existence 2. Independence 3. Linearity 4. Homoscedasticity 5. Normality 6. X measured without error

Assumptions of simple linear regression 1. For any fixed value of X, Y is a random variable with a certain probability distribution having finite mean and variance (Existence) Y Prob of Y X

Assumptions of simple linear regression 2. The Y values are statistically independent of one another (Independence)

Assumptions of simple linear regression 3. The mean value of Y given X is a straight line function of X (Linearity) Y Prob of Y X

Assumptions of simple linear regression 4. The variance of Y is the same for all X (Homoscedasticity) Y Prob of Y X

Assumptions of simple linear regression 5. For any fixed value of X, Y has a normal distribution • (Normality) Y Prob of Y X

Assumptions of simple linear regression 6. There are no measurement errors in X (X measured without error)

If assumptions hold, what can we do? 1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty 2. Describe quality of fit (variation of data around straight line) by estimate of σ² or r² 3. Tests of slope and intercept 4. Prediction and prediction bands 5. ANOVA Table

Parameters estimated using least-squares • Age-specific pregnancy rates of female sperm whales (from Best et al. 1984 Rep. int. Whal. Commn. Spec. Issue) Find line which minimizes squares of residuals

1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty • Age-specific pregnancy rates of female sperm whales (from Best et al. 1984 Rep. int. Whal. Commn. Spec. Issue)

1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty • β0 = 0.230 (SE 0.028) • 95% c.i.: 0.164; 0.296 • β1 = -0.0035 (SE 0.0009) • 95% c.i.: -0.0056; 0.0013

2. Describe quality of fit by estimate of σ² or r² σ² = 0.0195 r2 = 0.679 r2 (adjusted)= 0.633 (Propn. variance accounted for by regression)

3. Tests of slope and intercept a) Slope = 0 {Equivalent to r=0} b) Slope = Predetermined constant c) Intercept = 0 d) Intercept = Predetermined constant e) Compare slopes f) Compare intercepts {Assume same slope} (tests use t-distribution)

3a) Slope = 0 {Equivalent to r=0} Does pregnancy rate change with age? H0: β1 = 0 H1: β1≠ 0 P=0.006 Does pregnancy rate decline with age? H0: β1 = 0 H1: β1 > 0 P=0.003

3b) Slope = Predetermined constant β1 = 2.868 (SE 0.058) 95% c.i.: 2.752; 2.984 Does shape change with length? H0: β1 = 3 H1: β1≠ 3 P<0.05 weight=length3 Weights and Lengths of Cetacean Species Whitehead & Mann In Cetacean Societies 2000

3c) Intercept = 0 β0 = 0.436 (SE 0.080) 95% c.i.: 0.276; 0.596 Is birth length proportional to length? H0: β0 = 0 H1: β0≠ 0 P=0.000

3d) Intercept = Predetermined constant ?

3e) Compare slopes β1 (m) = 2.528 (SE 0.409) β1 (o) = 2.962 (SE 0.094) Does shape change differently with length for odontocetes and mysticetes? H0: β1 (m) = β1 (o) H1: β1 (m) ≠ β1 (o) P = 0.146 Weights and Lengths of Cetacean Species Whitehead & Mann 2000

3f) Compare intercepts{Assume same slope} β0 (m) = 2.528 (SE 0.409) β0 (o) = 2.962 (SE 0.094) Are odontocetes and mysticetes equally fat? H0: β0 (m) = β0 (o) H1: β0 (m) ≠β0 (o) P = 0.781 15 10 Log(Weight) 5 ORDER m o 0 0 1 2 3 4 Log(Length)

4. Prediction and prediction bands 95% Confidence Bands for Regression Line 95% Prediction Bands From: http://www.tufts.edu/~gdallal/slr.htm

5. ANOVA Table Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression 286.27 1 286.27 2475.07 0.00 Residual 5.32 46 0.12

If assumptions hold, what can we do? 1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty 2. Describe quality of fit (variation of data around straight line) by estimate of σ²or r² 3. Tests of slope and intercept 4. Prediction and prediction bands 5. ANOVA Table

Expected Testing assumptions: diagnostics • Use residuals to look at assumptions of regression: e(i) = Y(i) - (β0 + β1X(i)) Observed

Residuals • Residual: e(i) = Y(i) - (β0 + β1X(i)) • Standardized residuals: e(i)/S {S is the standard deviation of the residuals with adjusted degrees of freedom} • Studentized residuals: e(i) / [S(1 - h(i))] {h(i) is the "leverage value" of observation i: h(i) =1/n + (X(i) - ΣX(i)/n )²/[(n-1)S(X)²]} • Jackknifed residuals: e(i) / [S(-i) (1 - h(i))] {The residual variance (S(-i)) is calculated separately with each observation deleted}

Use Residuals to: a) look for outliers which we may wish to remove b) examine normality c) check for linearity d) check for homoscedasticity e) check for some kinds of non-independence

a) Using residuals to look for outliers

Yes if “outlier” was probably not produced by the process being studied measurement error different species ... No if “outlier” was probably produced by the process being studied extreme specimen Should outliers be removed?

b) Using residuals to examine normality • Lilliefors test for normality: P=0.62 • Lilliefors test for normality (excluding Bowhead whale): P=0.68

c) Using residuals to check for linearity

d) Use residuals to check for homoscedasticity

e) Use residuals to check for some kinds of non-independence • Durbin-Watson D Statistic: 1.48 • low values (<2) indicate autocorrelation • First Order Autocorrelation: 0.26 Days spent following sperm whales

Use Residuals to: a) look for outliers which we may wish to remove b) examine normality c) check for linearity d) check for homoscedasticity e) check for some kinds of non-independence

When assumptions do not hold: 1. Existence: Forget it!

When assumptions do not hold: 2. Independence: • collect data differently • reduce the size of the data set • add additional terms to the regression model • (e.g. autocorrelation term, species effect) More a problem for testing than prediction

When assumptions do not hold: 3. Linearity: • Transform either X or Y or both variables. e.g.: Log(Y) = ß0+ ß1 Log(X) + E • Polynomial regression: Y = ß0 + ß1X + ß2X² + ... + E • Non-linear regression. e.g.: Y = c + EXP(ß0 + ß1X) + E • Piecewise linear regression: Y = ß0 + ß1X [X>XK] + E where [X> XK]=0 if X< XK and [X> XK]=1 if X> XK.

Y = ß0 + ß1X [X>XK] + E • Log(Y) = ß0+ ß1 Log(X) + E • Y = ß0 + ß1X + ß2X² + ... + E • Y = c + EXP(ß0 + ß1X) + E

Transformation to improve linearity

When assumptions do not hold: 4. Homoscedasticity: • Transformations of the Y variable • Weighted regressions(if we know that some observations are more accurate than others)

Y - transformation to improve homoscedasticity

When assumptions do not hold: 5. Normality: • Transformations of the Y variable • Non-normal error structures (e.g. Poisson) Small departures from normality are not especially important, unless doing a test

When assumptions do not hold: 6. X measured without error: • Major axis regression • Reduced major axis, or geometric mean, regression

Major axis regression: • Minimize sum of squares of perpendicular distances from observations to regression line • Only if variables are in same units {First principal component of covariance matrix}

Regression: (1) Simple Linear Regression

Regression: (1) Simple Linear Regression

Presentation Transcript

1

1.

实用英语教程（ 1 ）

1.)