overview of our study of the multiple linear regression model
Download
Skip this Video
Download Presentation
Overview of our study of the multiple linear regression model

Loading in 2 Seconds...

play fullscreen
1 / 35

Overview of our study of the multiple linear regression model - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

Overview of our study of the multiple linear regression model. Regression models with more than one slope parameter. Example 1. Is brain and body size predictive of intelligence?. Sample of n = 38 college students

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Overview of our study of the multiple linear regression model' - howie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview of our study of the multiple linear regression model

Overview of our study of the multiple linear regression model

Regression models with

more than one slope parameter

is brain and body size predictive of intelligence

Example 1

Is brain and body size predictive of intelligence?
  • Sample of n = 38 college students
  • Response (y): intelligence based on PIQ (performance) scores from the (revised) Wechsler Adult Intelligence Scale.
  • Potential predictor (x1): Brain size based on MRI scans (given as count/10,000).
  • Potential predictor (x2): Height in inches.
  • Potential predictor (x3): Weight in pounds.
scatter matrix plot2
Scatter matrix plot
  • Illustrates the marginal relationships between each pair of variables without regard to the other variables.
  • The challenge is how the response y relates to all three predictors simultaneously.
a multiple linear regression model with three quantitative predictors

Example 1

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

A multiple linear regression model with three quantitative predictors
  • where …
  • yi is intelligence (PIQ) of student i
  • xi1 is brain size (MRI) of student i
  • xi2 is height (Height) of student i
  • xi3 is weight (Weight) of student i
some research questions

Example 1

Some research questions
  • Which predictors – brain size, height, or weight – explain some variation in PIQ?
  • What is the effect of brain size on PIQ, after taking into account height and weight?
  • What is the PIQ of an individual with a given brain size, height, and weight?
slide8

Example 1

The regression equation is

PIQ = 111 + 2.06 Brain - 2.73 Height + 0.001 Weight

Predictor Coef SE Coef T P

Constant 111.35 62.97 1.77 0.086

Brain 2.0604 0.5634 3.66 0.001

Height -2.732 1.229 -2.22 0.033

Weight 0.0006 0.1971 0.00 0.998

S = 19.79 R-Sq = 29.5% R-Sq(adj) = 23.3%

Analysis of Variance

Source DF SS MS F P

Regression 3 5572.7 1857.6 4.74 0.007

Residual Error 34 13321.8 391.8

Total 37 18894.6

Source DF Seq SS

Brain 1 2697.1

Height 1 2875.6

Weight 1 0.0

baby bird breathing habits in burrows

Example 2

Baby bird breathing habits in burrows?
  • Experiment with n = 120 nestling bank swallows
  • Response (y): % increase in “minute ventilation”, Vent, i.e., total volume of air breathed per minute
  • Potential predictor (x1): percentage of oxygen, O2, in the air the baby birds breathe
  • Potential predictor (x2): percentage of carbon dioxide, CO2, in the air the baby birds breathe
a first order model with two quantitative predictors

Example 2

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

A first order model with two quantitative predictors
  • where …
  • yi is percentage of minute ventilation
  • xi1 is percentage of oxygen
  • xi2 is percentage of carbon dioxide
some research questions1

Example 2

Some research questions
  • Is oxygen related to minute ventilation, after taking into account carbon dioxide?
  • Is carbon dioxide related to minute ventilation, after taking into account oxygen?
  • What is the mean minute ventilation of all nestling bank swallows whose breathing air is comprised of 15% oxygen and 5% carbon dioxide?
slide14

Example 2

The regression equation is

Vent = 86 - 5.33 O2 + 31.1 CO2

Predictor Coef SE Coef T P

Constant 85.9 106.0 0.81 0.419

O2 -5.330 6.425 -0.83 0.408

CO2 31.103 4.789 6.50 0.000

S = 157.4 R-Sq = 26.8% R-Sq(adj) = 25.6%

Analysis of Variance

Source DF SS MS F P

Regression 2 1061819 530909 21.44 0.000

Residual Error 117 2897566 24766

Total 119 3959385

Source DF Seq SS

O2 1 17045

CO2 1 1044773

is baby s birth weight related to smoking during pregnancy

Example 3

Is baby’s birth weight related to smoking during pregnancy?
  • Sample of n = 32 births
  • Response (y): birth weight in grams of baby
  • Potential predictor (x1): smoking status of mother (yes or no)
  • Potential predictor (x2): length of gestation in weeks
a first order model with one binary predictor

Example 3

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

A first order modelwith one binary predictor
  • where …
  • yi is birth weight of baby i
  • xi1 is length of gestation of baby i
  • xi2 = 1, if mother smokes and xi2 = 0, if not
estimated first order model with one binary predictor

Example 3

Estimated first order modelwith one binary predictor

The regression equation is

Weight = - 2390 + 143 Gest - 245 Smoking

some research questions2

Example 3

Some research questions
  • Is baby’s birth weight related to smoking during pregnancy?
  • How is birth weight related to gestation, after taking into account smoking status?
slide20

Example 3

The regression equation is

Weight = - 2390 + 143 Gest - 245 Smoking

Predictor Coef SE Coef T P

Constant -2389.6 349.2 -6.84 0.000

Gest 143.100 9.128 15.68 0.000

Smoking -244.54 41.98 -5.83 0.000

S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%

Analysis of Variance

Source DF SS MS F P

Regression 2 3348720 1674360 125.45 0.000

Residual Error 29 387070 13347

Total 31 3735789

Source DF Seq SS

Gest 1 2895838

Smoking 1 452881

compare three treatments a b c for severe depression

Example 4

Compare three treatments (A, B, C) for severe depression
  • Random sample of n = 36 severely depressed individuals.
  • y = measure of treatment effectiveness
  • x1 = age (in years)
  • x2 = 1 if patient received A and 0, if not
  • x3 = 1 if patient received B and 0, if not
slide23

Example 4

A second order model with one quantitative predictor, a three-group qualitative variable, and interactions
  • where …
  • yi is treatment effectiveness for patient i
  • xi1 is age of patient i
  • xi2 = 1, if treatment A and xi2 = 0, if not
  • xi3 = 1, if treatment B and xi3 = 0, if not
the estimated regression function

Example 4

The estimated regression function

Regression equation is

y = 6.21 + 1.03 age + 41.3 x2 + 22.7 x3

- 0.703 agex2 - 0.510 agex3

potential research questions

Example 4

Potential research questions
  • Does the effectiveness of the treatment depend on age?
  • Is one treatment superior to the other treatment for all ages?
  • What is the effect of age on the effectiveness of the treatment?
slide26

Example 4

Regression equation is y = 6.21 + 1.03 age + 41.3 x2

+ 22.7 x3 - 0.703 agex2 - 0.510 agex3

Predictor Coef SE Coef T P

Constant 6.211 3.350 1.85 0.074

age 1.03339 0.07233 14.29 0.000

x2 41.304 5.085 8.12 0.000

x3 22.707 5.091 4.46 0.000

agex2 -0.7029 0.1090 -6.45 0.000

agex3 -0.5097 0.1104 -4.62 0.000

S = 3.925 R-Sq = 91.4% R-Sq(adj) = 90.0%

Analysis of Variance

Source DF SS MS F P

Regression 5 4932.85 986.57 64.04 0.000

Residual Error 30 462.15 15.40

Total 35 5395.00

Source DF Seq SS

age 1 3424.43

x2 1 803.80

x3 1 1.19

agex2 1 375.00

agex3 1 328.42

how is the length of a bluegill fish related to its age

Example 5

How is the length of a bluegill fish related to its age?
  • In 1981, n = 78 bluegills randomly sampled from Lake Mary in Minnesota.
  • y = length (in mm)
  • x1 = age (in years)
a second order polynomial model with one quantitative predictor

Example 5

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

A second order polynomial model with one quantitative predictor
  • where …
  • yi is length of bluegill (fish) i (in mm)
  • xi is age of bluegill (fish) i (in years)
potential research questions1

Example 5

Potential research questions
  • How is the length of a bluegill fish related to its age?
  • What is the length of a randomly selected five-year-old bluegill fish?
slide32

Example 5

The regression equation is

length = 148 + 19.8 c_age - 4.72 c_agesq

Predictor Coef SE Coef T P

Constant 147.604 1.472 100.26 0.000

c_age 19.811 1.431 13.85 0.000

c_agesq -4.7187 0.9440 -5.00 0.000

S = 10.91 R-Sq = 80.1% R-Sq(adj) = 79.6%

Analysis of Variance

Source DF SS MS F P

Regression 2 35938 17969 151.07 0.000

Residual Error 75 8921 119

Total 77 44859

...

Predicted Values for New Observations

New Fit SE Fit 95.0% CI 95.0% PI

1 165.90 2.77 (160.39, 171.42) (143.49, 188.32)

Values of Predictors for New Observations

New c_age c_agesq

1 1.37 1.88

the good news
The good news!
  • Everything you learned about the simple linear regression model extends, with at most minor modification, to the multiple linear regression model:
    • same assumptions, same model checking
    • (adjusted) R2
    • t-tests and t-intervals for one slope
    • prediction (confidence) intervals for (mean) response
new things we need to learn
New things we need to learn!
  • The above research scenarios (models) and a few more
  • The “general linear test” which helps to answer many research questions
  • F-tests for more than one slope
  • Interactions between two or more predictor variables
  • Identifying influential data points
new things we need to learn1
New things we need to learn!
  • Detection of (“variance inflation factors”) correlated predictors (“multicollinearity”) and the limitations they cause
  • Selection of variables from a large set of variables for inclusion in a model (“stepwiseregression and “best subsets regression”)
ad