MBA Statistics 5165100 COURSE #4. Simple and multiple linear regression What should be the sales of ice cream?. Example:. Before beginning building a movie theater, one must estimate the daily number of people entering the building. How can we estimate it ?
Related searches for MBA Statistics 5165100 COURSE #4
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Simple and multiplelinear regression
What should be the sales of ice cream?
# of
square
feet
total land first outdoor heating
OBS value value # of acres floor condition type
1 199657 63247 1.63 1726 Good NatGas
2 78482 38091 0.495 1184 Good NatGas
3 119962 37665 0.375 1014 Good Electric
4 116492 54062 0.981 1260 Average Electric
5 131263 61546 1.14 1314 Average NatGas
...
78 253480 57948 0.862 1720 Good Electric
79 257037 57489 0.95 2004 Excellnt Electric
# of # of # of completed # of non completed # of
OBS rooms bedroom bathrooms bathrooms fireplaces GARAGE
1 8 4 2 1 2 Garage
2 6 2 1 0 0 NoGarage
3 7 3 2 0 1 Garage
4 6 3 2 0 1 Garage
5 8 4 2 1 2 NoGarage
...
78 10 5 5 1 1 Garage
79 9 4 2 2 2 Garage
Y factors? 
6.5  * r = 0.035Y  r = 1
 
 31  *
6.0  * * 29  *
 27  *
 25  *
5.5  * * 23  *
 21  *
 19  *
5.0  * 17  *
 15  *
 13  *
4.5  * * * 11  *
 
 
4.0  * * 4 5 6 7 8 9 10 11 12 13 14

4 5 6 7 8 9 10 11 12 13 14 X
X
Y  r = 1

8.0  *
10.5  *
13.0  *
15.5  *
18.0  *
20.5  *
23.0  *
25.5  *
28.0  *
30.5  *
33.0  *


4 5 6 7 8 9 10 11 12 13 14
X
Descriptive statistics factors?
Variable N Mean Median Sta.Deviation Minimum Maximum
Total 79 187253 156761 84401 74365 453744
Land 79 65899 59861 22987 35353 131224
Acre 79 1.579 1.040 1.324 0.290 5.880
Sq.Feet 79 1678 1628 635 672 3501
Rooms 79 8.519 8.000 2.401 5 18
Bedrooms 79 3.987 4.000 1.266 2 8
C.Bathro 79 2.241 2.000 1.283 1 7
Bathro 79 0.7215 1.000 0.715 0 3
Firepl. 79 1.975 2.000 1.368 0 7
Pearson Correlation Coefficients
Total Land Acre Sq.Feet Rooms Bedroom C.Bathro Bathro
Land 0.815
Acre 0.608 0.918
Sq.Feet 0.767 0.516 0.301
Rooms 0.626 0.518 0.373 0.563
Bedrooms 0.582 0.497 0.382 0.431 0.791
C.Bathro 0.626 0.506 0.376 0.457 0.479 0.586
Bathro 0.436 0.236 0.074 0.354 0.489 0.166 0.172
Firepl. 0.548 0.497 0.391 0.365 0.394 0.400 0.486 0.386
r = 0.816 in all cases below
12.5  10 
  *
  * * *
 *  *
10.0  * 8  * *
  *
Y1  * Y2 
 * * 
7.5  * * 6  *
 * 
 
 *  *
5.0  * 4 
 * 
  *
 
2.5  2 
 
4 5 6 7 8 9 10 11 12 13 14 4 5 6 7 8 9 10 11 12 13 14
X X
15.0  Y4 
 12.5  *
 
 
12.5  * 
 
Y3  10.0 
 
10.0   *
  *
 *  *
 * 7.5  *
7.5  * *  *
 * *  *
 * *  *
 * *  *
5.0  5.0 
 
4 5 6 7 8 9 10 11 12 13 14 8 19
X X
Y = 0 + 1X +
R2 = 1  (n2)/(n1){Se /Sy}2,
where Se is the standard deviation of the errors and Sy is the standard deviation of Y.
R2ajusted = 1  {Se/Sy}2 .
MODEL 1.
Regression Analysis
The regression equation is
Total = 16209 + 102 Sq.Feet
Predictor Coef StDev T P
Constant 16209 17447 0.93 0.356
Sq.Feet 101.939 9.734 10.47 0.000
S = 54556 RSq = 58.8% RSq(adj) = 58.2%
Analysis of Variance
Source DF SS MS F P
Regression 1 3.26460E+11 3.26460E+11 109.68 0.000
Residual Error 77 2.29181E+11 2976374177
Total 78 5.55641E+11
MODEL 2. indicate the percentage of the variability of
The regression equation is : Total =  347 + 22021 Rooms
Predictor Coef StDev T P
Constant 347 27621 0.01 0.990
Rooms 22021 3122 7.05 0.000
S = 66210 RSq = 39.3% RSq(adj) = 38.5%
Analysis of Variance
Source DF SS MS F P
Regression 1 2.18090E+11 2.18090E+11 49.75 0.000
Residual Error 77 3.37551E+11 4383775699
Total 78 5.55641E+11
__________________________________________________________________
MODEL 3.
The regression equation is : Total = 32428 + 38829 Bedrooms
Predictor Coef StDev T P
Constant 32428 25826 1.26 0.213
Bedrooms 38829 6177 6.29 0.000
S = 69056 RSq = 33.9% RSq(adj) = 33.1%
Analysis of Variance
Source DF SS MS F P
Regression 1 1.88445E+11 1.88445E+11 39.52 0.000
Residual Error 77 3.67196E+11 4768775127
Total 78 5.55641E+11
Model 1 because it has the largest value of R2.
[156 418, 181 817]
as calculated byCIregression.xls
[59 742, 278 492]
Y = 0 + 1X1 + 2X2 + … + pXp +
MODEL 1.
The regression equation is
Total =  89131 + 3.05 Land  20730 Acre + 43.3 Sq.Feet  4352 Rooms
+ 10049 Bedroom + 7606 C.Bathro + 18725 Bathro + 882 Firepl.
Predictor Coef StDev T P
Constant 89131 18302 4.87 0.000
Land 3.0518 0.5260 5.80 0.000
Acre 20730 7907 2.62 0.011
Sq.Feet 43.336 7.670 5.65 0.000
Rooms 4352 3036 1.43 0.156
Bedroom 10049 5307 1.89 0.062
CBathro 7606 3610 2.11 0.039
Bathro 18725 6585 2.84 0.006
Firepl. 882 3184 0.28 0.783
S = 29704 RSq = 88.9% RSq(adj) = 87.6%
Analysis of Variance
Source DF SS MS F P
Regression 8 4.93877E+11 61734659810 69.97 0.000
Residual Error 70 61763515565 882335937
Total 78 5.55641E+11
MODEL 2 indicate the percentage of the variability of
Regression Analysis
The regression equation is
Total =  97512 + 3.11 Land  21880 Acre + 40.2 Sq.Feet
+ 4411 Bedroom + 8466 C.bathro + 14328 Bathro
Predictor Coef StDev T P
Constant 97512 17466 5.58 0.000
Land 3.1103 0.5236 5.94 0.000
Acre 21880 7884 2.78 0.007
Sq.Feet 40.195 7.384 5.44 0.000
Bedroom 4411 3469 1.27 0.208
C.bathro 8466 3488 2.43 0.018
Bathro 14328 5266 2.72 0.008
S = 29763 RSq = 88.5% RSq(adj) = 87.6%
Analysis of Variance
Source DF SS MS F P
Regression 6 4.91859E+11 81976430646 92.54 0.000
Residual Error 72 63782210167 885864030
Total 78 5.55641E+11
MODEL 3 indicate the percentage of the variability of
Regression Analysis
The regression equation is
Total =  90408 + 3.20 Land  22534 Acre + 41.1 Sq.Feet
+ 10234 C.bathro + 14183 Bathro
Predictor Coef StDev T P
Constant 90408 16618 5.44 0.000
Land 3.2045 0.5205 6.16 0.000
Acre 22534 7901 2.85 0.006
Sq.Feet 41.060 7.383 5.56 0.000
C.bathro 10234 3213 3.19 0.002
Bathro 14183 5287 2.68 0.009
S = 29889 RSq = 88,3% RSq(adj) = 87,5%
Analysis of Variance
Source DF SS MS F P
Regression 5 4.90426E+11 98085283380 109.80 0.000
Residual Error 73 65214377146 893347632
Total 78 5.55641E+11
MODEL 4
The regression equation is
Total =  55533 + 1.82 Land + 49.8 Sq.Feet + 11696 C.bathro
+ 18430 Bathro
Predictor Coef StDev T P
Constant 55533 11783 4.71 0.000
Land 1.8159 0.1929 9.42 0.000
Sq.Feet 49.833 7.028 7.09 0.000
C.bathro 11696 3321 3.52 0.001
Bathro 18430 5312 3.47 0.001
S = 31297 RSq = 87.0% RSq(adj) = 86.3%
Analysis of Variance
Source DF SS MS F P
Regression 4 4.83160E+11 1.20790E+11 123.32 0.000
Residual Error 74 72481137708 979474834
Total 78 5.55641E+11
There are several techniques:
Best Subsets Regression : Response is Total
B C
S e b B F
q R d a a i
L A f o r t t r
a c e o o h h e
Adj. n r e m o r r p
Vars RSq RSq Cp s d e t s m o o l
1 66.4 65.9 136.8 49262 X
1 58.8 58.2 184.7 54556 X
1 39.3 38.5 307.6 66210 X
2 82.7 82.2 35.9 35564 X X
2 78.8 78.3 60.3 39343 X X
2 74.4 73.7 88.1 43244 X X
3 85.6 85.0 19.5 32637 X X X
3 84.8 84.2 24.5 33521 X X X
3 84.8 84.2 24.9 33591 X X X
4 87.1 86.4 12.2 31115 X X X X
4 87.0 86.3 13.1 31297 X X X X
4 86.6 85.9 15.2 31682 X X X X
5 88.3 87.5 6.9 29889 X X X X X
5 87.6 86.7 11.2 30744 X X X X X
5 87.4 86.5 12.4 30979 X X X X X
6 88.5 87.6 7.3 29763 X X X X X X
6 88.3 87.3 8.6 30030 X X X X X X
6 88.3 87.3 8.9 30096 X X X X X X
7 88.9 87.8 7.1 29510 X X X X X X X
7 88.6 87.4 9.1 29924 X X X X X X X
7 88.3 87.2 10.6 30240 X X X X X X X
8 88.9 87.6 9.0 29704 X X X X X X X X
Best Subsets Regression : Response is Total
B C
S e b B F
q R d a a i
L f o r t t r
a e o o h h e
Adj. n e m o r r p
Vars RSq RSq Cp s d t s m o o l
1 66.4 65.9 120.6 49262 X
1 58.8 58.2 164.9 54556 X
1 39.3 38.5 278.3 66210 X
2 82.7 82.2 27.6 35564 X X
2 72.7 71.9 86.0 44704 X X
2 72.5 71.8 86.8 44813 X X
3 84.8 84.2 17.2 33521 X X X
3 84.8 84.2 17.6 33591 X X X
3 84.0 83.3 22.3 34467 X X X
4 87.0 86.3 6.9 31297 X X X X
4 86.1 85.3 12.1 32352 X X X X
4 85.3 84.5 16.5 33226 X X X X
5 87.3 86.4 6.9 31100 X X X X X
5 87.0 86.1 8.5 31439 X X X X X
5 87.0 86.1 8.9 31509 X X X X X
6 87.8 86.8 6.1 30707 X X X X X X
6 87.3 86.3 8.7 31264 X X X X X X
6 87.0 85.9 10.5 31656 X X X X X X
7 87.8 86.6 8.0 30908 X X X X X X X
1 combination: confidence interval forYmean and a new value of Y(prediction) being given a specific value combinationforX1, X2, …, Xp.
[170 842, 187 306]
[116 173, 241 974]
Sex: 1 if male, 0 otherwise
Garage: 1 if garage, 0 if not.
Y = 0 + 1X1 + 2X2 + 3X3 + 4X4 +
Question: Interpret 0, 1, 2, 3 , 4 .
How do know if women have a smaller salary?
H0: bgarage 0 vs H1: bgarage> 0
Totale =  72080 + 1,83 Terrain + 47,2 Pied2
+ 11535 SbainsC + 18899 Sbains  22372 Garage
Predictor Coef StDev T P
Constant 72080 14175 5,08 0,000
Terrain 1,8342 0,1892 9,69 0,000
Pied2 47,175 7,013 6,73 0,000
SbainsC 11535 3256 3,54 0,001
Sbains 18899 5211 3,63 0,001
Garage 22372 11116 2,01 0,058
S = 30671 RSq = 87,6% RSq(adj) = 86,8%
H0: bgarage≥ 0 vs H1: bgarage< 0
In that case, the right choice for H1 is:
H0: 1 = 2 = ... = k1 = 0
H0: 1 = 2 = ... = k