Loading in 2 Seconds...

Overview of our study of the multiple linear regression model

Loading in 2 Seconds...

- By
**howie** - Follow User

- 74 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Overview of our study of the multiple linear regression model' - howie

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Overview of our study of the multiple linear regression model

Regression models with

more than one slope parameter

Is brain and body size predictive of intelligence?

- Sample of n = 38 college students
- Response (y): intelligence based on PIQ (performance) scores from the (revised) Wechsler Adult Intelligence Scale.
- Potential predictor (x1): Brain size based on MRI scans (given as count/10,000).
- Potential predictor (x2): Height in inches.
- Potential predictor (x3): Weight in pounds.

Scatter matrix plot

- Illustrates the marginal relationships between each pair of variables without regard to the other variables.
- The challenge is how the response y relates to all three predictors simultaneously.

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

A multiple linear regression model with three quantitative predictors- where …
- yi is intelligence (PIQ) of student i
- xi1 is brain size (MRI) of student i
- xi2 is height (Height) of student i
- xi3 is weight (Weight) of student i

Some research questions

- Which predictors – brain size, height, or weight – explain some variation in PIQ?
- What is the effect of brain size on PIQ, after taking into account height and weight?
- What is the PIQ of an individual with a given brain size, height, and weight?

The regression equation is

PIQ = 111 + 2.06 Brain - 2.73 Height + 0.001 Weight

Predictor Coef SE Coef T P

Constant 111.35 62.97 1.77 0.086

Brain 2.0604 0.5634 3.66 0.001

Height -2.732 1.229 -2.22 0.033

Weight 0.0006 0.1971 0.00 0.998

S = 19.79 R-Sq = 29.5% R-Sq(adj) = 23.3%

Analysis of Variance

Source DF SS MS F P

Regression 3 5572.7 1857.6 4.74 0.007

Residual Error 34 13321.8 391.8

Total 37 18894.6

Source DF Seq SS

Brain 1 2697.1

Height 1 2875.6

Weight 1 0.0

Baby bird breathing habits in burrows?

- Experiment with n = 120 nestling bank swallows
- Response (y): % increase in “minute ventilation”, Vent, i.e., total volume of air breathed per minute
- Potential predictor (x1): percentage of oxygen, O2, in the air the baby birds breathe
- Potential predictor (x2): percentage of carbon dioxide, CO2, in the air the baby birds breathe

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

A first order model with two quantitative predictors- where …
- yi is percentage of minute ventilation
- xi1 is percentage of oxygen
- xi2 is percentage of carbon dioxide

Some research questions

- Is oxygen related to minute ventilation, after taking into account carbon dioxide?
- Is carbon dioxide related to minute ventilation, after taking into account oxygen?
- What is the mean minute ventilation of all nestling bank swallows whose breathing air is comprised of 15% oxygen and 5% carbon dioxide?

The regression equation is

Vent = 86 - 5.33 O2 + 31.1 CO2

Predictor Coef SE Coef T P

Constant 85.9 106.0 0.81 0.419

O2 -5.330 6.425 -0.83 0.408

CO2 31.103 4.789 6.50 0.000

S = 157.4 R-Sq = 26.8% R-Sq(adj) = 25.6%

Analysis of Variance

Source DF SS MS F P

Regression 2 1061819 530909 21.44 0.000

Residual Error 117 2897566 24766

Total 119 3959385

Source DF Seq SS

O2 1 17045

CO2 1 1044773

Is baby’s birth weight related to smoking during pregnancy?

- Sample of n = 32 births
- Response (y): birth weight in grams of baby
- Potential predictor (x1): smoking status of mother (yes or no)
- Potential predictor (x2): length of gestation in weeks

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

A first order modelwith one binary predictor- where …
- yi is birth weight of baby i
- xi1 is length of gestation of baby i
- xi2 = 1, if mother smokes and xi2 = 0, if not

Estimated first order modelwith one binary predictor

The regression equation is

Weight = - 2390 + 143 Gest - 245 Smoking

Some research questions

- Is baby’s birth weight related to smoking during pregnancy?
- How is birth weight related to gestation, after taking into account smoking status?

The regression equation is

Weight = - 2390 + 143 Gest - 245 Smoking

Predictor Coef SE Coef T P

Constant -2389.6 349.2 -6.84 0.000

Gest 143.100 9.128 15.68 0.000

Smoking -244.54 41.98 -5.83 0.000

S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%

Analysis of Variance

Source DF SS MS F P

Regression 2 3348720 1674360 125.45 0.000

Residual Error 29 387070 13347

Total 31 3735789

Source DF Seq SS

Gest 1 2895838

Smoking 1 452881

Compare three treatments (A, B, C) for severe depression

- Random sample of n = 36 severely depressed individuals.
- y = measure of treatment effectiveness
- x1 = age (in years)
- x2 = 1 if patient received A and 0, if not
- x3 = 1 if patient received B and 0, if not

A second order model with one quantitative predictor, a three-group qualitative variable, and interactions

- where …
- yi is treatment effectiveness for patient i
- xi1 is age of patient i
- xi2 = 1, if treatment A and xi2 = 0, if not
- xi3 = 1, if treatment B and xi3 = 0, if not

The estimated regression function

Regression equation is

y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3

- 0.703 agex2 - 0.510 agex3

Potential research questions

- Does the effectiveness of the treatment depend on age?
- Is one treatment superior to the other treatment for all ages?
- What is the effect of age on the effectiveness of the treatment?

Regression equation is y = 6.21 + 1.03 age + 41.3 x2

+ 22.7 x3 - 0.703 agex2 - 0.510 agex3

Predictor Coef SE Coef T P

Constant 6.211 3.350 1.85 0.074

age 1.03339 0.07233 14.29 0.000

x2 41.304 5.085 8.12 0.000

x3 22.707 5.091 4.46 0.000

agex2 -0.7029 0.1090 -6.45 0.000

agex3 -0.5097 0.1104 -4.62 0.000

S = 3.925 R-Sq = 91.4% R-Sq(adj) = 90.0%

Analysis of Variance

Source DF SS MS F P

Regression 5 4932.85 986.57 64.04 0.000

Residual Error 30 462.15 15.40

Total 35 5395.00

Source DF Seq SS

age 1 3424.43

x2 1 803.80

x3 1 1.19

agex2 1 375.00

agex3 1 328.42

How is the length of a bluegill fish related to its age?

- In 1981, n = 78 bluegills randomly sampled from Lake Mary in Minnesota.
- y = length (in mm)
- x1 = age (in years)

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

A second order polynomial model with one quantitative predictor- where …
- yi is length of bluegill (fish) i (in mm)
- xi is age of bluegill (fish) i (in years)

Potential research questions

- How is the length of a bluegill fish related to its age?
- What is the length of a randomly selected five-year-old bluegill fish?

The regression equation is

length = 148 + 19.8 c_age - 4.72 c_agesq

Predictor Coef SE Coef T P

Constant 147.604 1.472 100.26 0.000

c_age 19.811 1.431 13.85 0.000

c_agesq -4.7187 0.9440 -5.00 0.000

S = 10.91 R-Sq = 80.1% R-Sq(adj) = 79.6%

Analysis of Variance

Source DF SS MS F P

Regression 2 35938 17969 151.07 0.000

Residual Error 75 8921 119

Total 77 44859

...

Predicted Values for New Observations

New Fit SE Fit 95.0% CI 95.0% PI

1 165.90 2.77 (160.39, 171.42) (143.49, 188.32)

Values of Predictors for New Observations

New c_age c_agesq

1 1.37 1.88

The good news!

- Everything you learned about the simple linear regression model extends, with at most minor modification, to the multiple linear regression model:
- same assumptions, same model checking
- (adjusted) R2
- t-tests and t-intervals for one slope
- prediction (confidence) intervals for (mean) response

New things we need to learn!

- The above research scenarios (models) and a few more
- The “general linear test” which helps to answer many research questions
- F-tests for more than one slope
- Interactions between two or more predictor variables
- Identifying influential data points

New things we need to learn!

- Detection of (“variance inflation factors”) correlated predictors (“multicollinearity”) and the limitations they cause
- Selection of variables from a large set of variables for inclusion in a model (“stepwiseregression and “best subsets regression”)

Download Presentation

Connecting to Server..