Chapter 7 linear correlation regression methods
This presentation is the property of its rightful owner.
Sponsored Links
1 / 55

CHAPTER 7 Linear Correlation & Regression Methods PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on
  • Presentation posted in: General

CHAPTER 7 Linear Correlation & Regression Methods. 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple Linear Regression. Parameter Estimation via SAMPLE DATA …. Testing for association between two POPULATION variables X and Y ….

Download Presentation

CHAPTER 7 Linear Correlation & Regression Methods

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chapter 7 linear correlation regression methods

CHAPTER 7Linear Correlation & Regression Methods

7.1 - Motivation

7.2 - Correlation / Simple Linear Regression

7.3 - Extensions of Simple Linear Regression


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

Testing for association between two POPULATION variables X and Y…

  • Categorical variables

  • Numerical variables

Chi-squared Test

???????

PARAMETERS

  • Means:

  • Variances:

  • Covariance:

Examples:

X = Disease status (D+, D–)

Y = Exposure status (E+, E–)

X = # children in household (0, 1-2, 3-4, 5+)

Y = Income level (Low, Middle, High)


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

  • Numerical variables

???????

PARAMETERS

PARAMETERS

STATISTICS

  • Means:

  • Means:

  • Variances:

  • Variances:

  • Covariance:

  • Covariance:

(can be +, –, or 0)


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

  • Numerical variables

???????

PARAMETERS

PARAMETERS

STATISTICS

Y

  • Means:

  • Means:

  • Variances:

  • Variances:

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

  • Covariance:

  • Covariance:

(can be +, –, or 0)

X


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

  • Numerical variables

???????

PARAMETERS

PARAMETERS

STATISTICS

Y

  • Means:

  • Means:

  • Variances:

  • Variances:

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

  • Covariance:

  • Covariance:

(can be +, –, or 0)

Does this suggest a linear trend between X and Y?

If so, how do we measure it?

X


Chapter 7 linear correlation regression methods

LINEAR

Testing for association between two population variables X and Y…

^

  • Numerical variables

???????

PARAMETERS

  • Means:

  • Variances:

  • Covariance:

  • Linear Correlation Coefficient:

Always between –1 and +1


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

  • Numerical variables

???????

PARAMETERS

PARAMETERS

STATISTICS

Y

  • Means:

  • Means:

  • Variances:

  • Variances:

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

  • Covariance:

  • Covariance:

(can be +, –, or 0)

  • Linear Correlation Coefficient:

Always between –1 and +1

X


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

Example in R (reformatted for brevity):

  • Numerical variables

> pop = seq(0, 20, 0.1)

> x = sort(sample(pop, 10))

1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

> y = sample(pop, 10)

13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

???????

PARAMETERS

PARAMETERS

STATISTICS

Y

> c(mean(x), mean(y))

7.05 12.08

> var(x)

29.48944

> var(y)

43.76178

  • Means:

  • Means:

  • Variances:

  • Variances:

JAMA. 2003;290:1486-1493

plot(x, y, pch = 19)

Scatterplot

n = 10

(n data points)

  • Covariance:

  • Covariance:

> cov(x, y)

-25.86667

(can be +, –, or 0)

  • Linear Correlation Coefficient:

Always between –1 and +1

> cor(x, y)

-0.7200451

X


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

  • Numerical variables

  • Linear Correlation Coefficient:

Always between –1 and +1

Y

r measures the strength of linear association

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

X


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

  • Numerical variables

  • Linear Correlation Coefficient:

Always between –1 and +1

Y

r measures the strength of linear association

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

r

–1 0 +1

X

negative linear correlation

positive linear correlation


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

  • Numerical variables

  • Linear Correlation Coefficient:

Always between –1 and +1

Y

r measures the strength of linear association

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

r

–1 0 +1

X

negative linear correlation

positive linear correlation


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

  • Numerical variables

  • Linear Correlation Coefficient:

Always between –1 and +1

Y

r measures the strength of linear association

r measures the strength of linear association

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

r

–1 0 +1

X

negative linear correlation

positive linear correlation


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

  • Numerical variables

  • Linear Correlation Coefficient:

Always between –1 and +1

Y

r measures the strength of linear association

> cor(x, y)

-0.7200451

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

r

–1 0 +1

X

negative linear correlation

positive linear correlation


Chapter 7 linear correlation regression methods

Testing for linear association between two numerical population variables X and Y…

Now that we have r, we can conduct HYPOTHESIS TESTING on 

  • Linear Correlation Coefficient

Test Statistic for p-value

  • Linear Correlation Coefficient

p-value = .0189 < .05

2 * pt(-2.935, 8)


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

If such an association between X and Y exists, then it follows that for any intercept 0 and slope 1, we have…

  • Linear Correlation Coefficient:

r measures the strength of linear association

“Response = Model + Error”

> cor(x, y)

-0.7200451

in what sense???

Residuals

Find estimates and for the “best” line


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

If such an association between X and Y exists, then it follows that for any intercept 0 and slope 1, we have…

  • Linear Correlation Coefficient:

r measures the strength of linear association

“Response = Model + Error”

> cor(x, y)

-0.7200451

“Least Squares Regression Line”

in what sense???

i.e., that minimizes

Residuals

Find estimates and for the “best” line


Chapter 7 linear correlation regression methods

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

If such an association between X and Y exists, then it follows that for any intercept 0 and slope 1, we have…

  • Linear Correlation Coefficient:

r measures the strength of linear association

“Response = Model + Error”

> cor(x, y)

-0.7200451

i.e., that minimizes

Residuals

Find estimates and for the “best” line

Check 


Chapter 7 linear correlation regression methods

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

> cor(x, y)

-0.7200451

i.e., that minimizes

Residuals

Find estimates and for the “best” line


Chapter 7 linear correlation regression methods

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

> cor(x, y)

-0.7200451

i.e., that minimizes

Residuals

Find estimates and for the “best” line


Chapter 7 linear correlation regression methods

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

> cor(x, y)

-0.7200451

i.e., that minimizes

Residuals

Find estimates and for the “best” line


Chapter 7 linear correlation regression methods

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

residuals

> cor(x, y)

-0.7200451

i.e., that minimizes

Residuals

Find estimates and for the “best” line


Chapter 7 linear correlation regression methods

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

residuals

> cor(x, y)

-0.7200451

i.e., that minimizes

Residuals

Find estimates and for the “best” line


Chapter 7 linear correlation regression methods

Testing for linear association between two numerical population variables X and Y…

Now that we have these, we can conduct HYPOTHESIS TESTING on 0 and 1

  • Linear Regression Coefficients

“Response = Model + Error”

Test Statistic for p-value?

  • Linear Regression Coefficients


Chapter 7 linear correlation regression methods

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

residuals

> cor(x, y)

-0.7200451

i.e., that minimizes

Residuals

Find estimates and for the “best” line


Chapter 7 linear correlation regression methods

Testing for linear association between two numerical population variables X and Y…

Now that we have these, we can conduct HYPOTHESIS TESTING on 0 and 1

  • Linear Regression Coefficients

“Response = Model + Error”

Test Statistic for p-value

  • Linear Regression Coefficients

p-value = .0189

Same t-score as H0:  = 0!


Chapter 7 linear correlation regression methods

> plot(x, y, pch = 19)

> lsreg = lm(y ~ x) # or lsfit(x,y)

> abline(lsreg)

> summary(lsreg)

Call:

lm(formula = y ~ x)

Residuals:

Min 1Q Median 3Q Max

-8.6607 -3.2154 0.8954 3.4649 5.7742

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 18.2639 2.6097 6.999 0.000113 ***

x -0.8772 0.2989 -2.935 0.018857 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.869 on 8 degrees of freedom

Multiple R-squared: 0.5185, Adjusted R-squared: 0.4583

F-statistic: 8.614 on 1 and 8 DF, p-value: 0.01886

BUT WHY HAVE TWO METHODS FOR THE SAME PROBLEM???

Because this second method generalizes…


Chapter 7 linear correlation regression methods

Testing for linear association between a population response variable Y and multiple predictor variables X1, X2, X3, … etc.

Multilinear Regression

“Response = Model + Error”

“main effects”

For now, assume the “additive model,” i.e., main effects only.


Chapter 7 linear correlation regression methods

Y

True response yi

Residual

Fitted response

X2

0

(x1i , x2i)

Predictors

X1

Multilinear Regression

Least Squares calculation of regression coefficients is computer-intensive. Formulas require Linear Algebra (matrices)!

Once calculated, how do we then test the null hypothesis?

ANOVA


Chapter 7 linear correlation regression methods

ANOVA Table


Chapter 7 linear correlation regression methods

ANOVA Table


Chapter 7 linear correlation regression methods

ANOVA Table

In our example, k = 2 regression coefficients and n = 10 data points.

?

?

?


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

STATISTICS

  • Means:

  • Variances:

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

SSTotis a measure of the total amount of variability in the observed responses (i.e., before any model-fitting).


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

STATISTICS

  • Means:

  • Variances:

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

SSRegis a measure of the total amount of variability in the fitted responses (i.e., after model-fitting.)


Chapter 7 linear correlation regression methods

Parameter Estimation via SAMPLE DATA …

STATISTICS

  • Means:

  • Variances:

JAMA. 2003;290:1486-1493

Scatterplot

(n data points)

SSErris a measure of the total amount of variability in the resulting residuals (i.e., after model-fitting).


Chapter 7 linear correlation regression methods

ANOVA Table

In our example, k = 2 regression coefficients and n = 10 data points.

?

?

?


Chapter 7 linear correlation regression methods

ANOVA Table

In our example, k = 2 regression coefficients and n = 10 data points.


Chapter 7 linear correlation regression methods

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

residuals

> cor(x, y)

-0.7200451

= 204.2

= 189.656

= 9 (43.76178)

Residuals

= 393.856


Chapter 7 linear correlation regression methods

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

Tot

Err

predictor

observed response

Reg

fitted response

residuals

> cor(x, y)

-0.7200451

= 204.2

= 189.656

= 393.856

Residuals

minimum

SSTot = SSReg + SSErr


Chapter 7 linear correlation regression methods

ANOVA Table

In our example, k = 2 regression coefficients and n = 10 data points.


Chapter 7 linear correlation regression methods

ANOVA Table

In our example, k = 2 regression coefficients and n = 10 data points.

Same as before!


Chapter 7 linear correlation regression methods

> summary(aov(lsreg))

Df Sum Sq Mean Sq F value Pr(>F)

x 1 204.20 204.201 8.6135 0.01886 *

Residuals 8 189.66 23.707


Chapter 7 linear correlation regression methods

Coefficient of Determination

The least squares regression line accounts for 51.85% of the total variability in the observed response, with 48.15% remaining.

Moreover,


Chapter 7 linear correlation regression methods

> cor(x, y)

-0.7200451

Coefficient of Determination

The least squares regression line accounts for 51.85% of the total variability in the observed response, with 48.15% remaining.

Moreover,


Chapter 7 linear correlation regression methods

> plot(x, y, pch = 19)

> lsreg = lm(y ~ x)

> abline(lsreg)

> summary(lsreg)

Call:

lm(formula = y ~ x)

Residuals:

Min 1Q Median 3Q Max

-8.6607 -3.2154 0.8954 3.4649 5.7742

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 18.2639 2.6097 6.999 0.000113 ***

x -0.8772 0.2989 -2.935 0.018857 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.869 on 8 degrees of freedom

Multiple R-squared: 0.5185, Adjusted R-squared: 0.4583

F-statistic: 8.614 on 1 and 8 DF, p-value: 0.01886

Coefficient of Determination

The least squares regression line accounts for 51.85% of the total variability in the observed response, with 48.15% remaining.


Summary of linear correlation and simple linear regression

Summary of Linear Correlation and Simple Linear Regression

Means

Variances

Covariance

X

Y

Given:

  • Linear Correlation Coefficient

Y

JAMA. 2003;290:1486-1493

–1 r +1

measures the strength of linear association

  • Least Squares Regression Line

minimizesSSErr =

X

= SSTot – SSReg

(ANOVA)


Summary of linear correlation and simple linear regression1

Summary of Linear Correlation and Simple Linear Regression

Means

Variances

Covariance

X

Y

Given:

  • Linear Correlation Coefficient

Y

JAMA. 2003;290:1486-1493

–1 r +1

measures the strength of linear association

  • Least Squares Regression Line

minimizesSSErr =

X

= SSTot – SSReg

(ANOVA)

All point estimates can be upgraded to CIs for hypothesis testing, etc.

proportion of total variability modeled by the regression line’s variability.

  • Coefficient of Determination


Chapter 7 linear correlation regression methods

Testing for linear association between a population response variable Y and multiple predictor variables X1, X2, X3, … etc.

Multilinear Regression

“Response = Model + Error”

“main effects”

R code example: lsreg= lm(y ~ x1+x2+x3)


Chapter 7 linear correlation regression methods

Testing for linear association between a population response variable Y and multiple predictor variables X1, X2, X3, … etc.

Multilinear Regression

“Response = Model + Error”

“main effects”

quadratic terms, etc.

(“polynomial regression”)

R code example: lsreg= lm(y ~ x1+x2+x3)

R code example: lsreg= lm(y ~ x+x^2+x^3)


Chapter 7 linear correlation regression methods

Testing for linear association between a population response variable Y and multiple predictor variables X1, X2, X3, … etc.

Multilinear Regression

“Response = Model + Error”

“main effects”

quadratic terms, etc.

(“polynomial regression”)

“interactions”

“interactions”

R code example: lsreg= lm(y ~ x+x^2+x^3)

R code example: lsreg= lm(y ~ x1+x2+x1:x2)

R code example: lsreg= lm(y ~ x1*x2)


Chapter 7 linear correlation regression methods

Recall…

Multiple Linear Regwith interaction

Example in R (reformatted for brevity):

with an indicator (“dummy”) variable:

> I = c(1,1,1,1,1,0,0,0,0,0)

I = 1

> lsreg = lm(y ~ x*I)

> summary(lsreg)

Coefficients:

Estimate

(Intercept) 6.56463

x 0.00998

I 6.80422

x:I 1.60858

I = 0

Suppose these are actually two subgroups, requiring two distinct linear regressions!


Logistic reg transformations

Logistic Reg, Transformations


  • Login