Chap 2. SIMPLE LINEAR REGRESSION MODEL

Chap 2.SIMPLE LINEAR REGRESSION MODEL by BambangJuanda

Definition of Model • Problem Formulation  Model • Model: Abstract of reality in mathematic equation • Ekonometric model : statistic model including error Y = f(X1, X2, ..., Xp) + error (2.1) actual data = estimate + residual data = systematic term + non-systematic term estimated Y = f(X1, X2, ..., Xp) (2.2)

Description of Error : • Measurement error and proxies of dependent variable Y and explanatory variable X1, X2, ..., and Xp. • Wrong assumption of the functional form. • Omitted variables. • unpredictable effects.

Simple Linear Regression Model • Relation between 2 variables in Linear function of Parameter Population Regression Model : Slope Error random intercept Y Explanatory (Independent) Variable Respons (dependent) Variable Sample Regresssion Model :

Population Regression Model Y Y = b + b X + e Observation value i 0 1 i i e = random error i + m b b = X estimate 0 1 i Y/Xi X Observation value

Persamaan Regresi Linear Sederhana (Teladan) Annual Store Square Sales Fotage ($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760 Square footage and annual sales ($000) for sample of 7 grocery stores

Scatter Diagram

Sample Linear regression Model Ù Ù Yi = Estimated Y for the ith observation Xi = Value of X for the ith observation b0 = estimated intercept coefficient of b0 ; average Y when X=0 b1 = estimated slope coefficient of b1 ; average difference of Y when X differ 1 unit

Ordinary Least Squares (OLS) Estimation Method

The “Best” Straight Line Equation Ù Predictor Coef SE Coef T P Constant 1636.4 451.5 3.62 0.015 X 1.4866 0.1650 9.01 0.000 S = 611.752 R-Sq = 94.2% R-Sq(adj) = 93.0% Analysis of Variance Source DF SS MS F P Regression 1 30380456 30380456 81.18 0.000 Residual Error 5 1871200 374240 Total 6 32251656

The “Best” Straight Line Equation Yi = 1636.415 +1.487Xi Ù

Interpretation of Coefficients Ù Yi = 1636.415 +1.487Xi Interpretation of slope value 1.487 (‘generally’): for the increase of 1 unit in X, estimated Y will increase 1.487unit. • The ‘precise’ Interpretation’: • Average difference of sales between stores which their area differ 1 square footage is $1487 per year • The Implication of estimated slope(with certain assumption: • When the size of store increase 1 square feet, the model predicts that the expected sales will increase $1487 per year.

Assumption of Linear Regression Model • Normality of error • Homoscedasticity of error • Independence of error

Variance of Error around Regression Line f(e) Y X2 X1 X Regression Line

Estimated Standard Error bi~N(i;2 ) bi Properties of OLS Estimator: i bi

Inference of Slope: t-test • t-test for Population Slope Is there linear relationship between X and Y ? • Statistical Hypothesis • H0: b1 = 0 (X cannot explainY) • H1: b1¹ 0 (X can explain Y) • Test statistic: where and df = n - 2

Inference of Slope: Example of t-test • H0: b1 = 0 H1: b1¹ 0 a=.05 df=7 - 2 = 5 Critical values : T-test statistic : Decision: Conclussion: Reject H0 Reject H0 Reject H0 .025 .025 There is a linear relationship. The bigger the store size, the larger its sales t -2.5706 0 2.5706

Confidence Interval of Slope b1 ± tn-2 Sb1 Excel Output of the problem of Grocery Stores We estimate with 95% confidence that the value of slope between 1.062 and 1.911. (This confidence interval excludes value of 0)

Level of significance,a and rejection region b1~N(I;2 ) b1 1 b1 a H0:1³ k H1: 1 < k Rejection region (ttk kritis) 0 t a H0: 1£ k H1: 1 > k t 0 a/2 H0: 1= k H1: 1¹ k 0 t

Assumption of Linear Regression Model : εi are normally, independently and identically distributed for i=1,.. ,n. (i) ei~N(0;2 ) • independence: Cov(εt, εs)= E(εtεs)=0 for t≠s. • Homoscedasticity: Var(εi)= E(εi2)=2. 0 ei (ii) X fixed variable • OLS estimates of i are Best Linear Unbiased Estimator, and normally distributed ^ μY/X • Estimated average Y for certain Xi Normally distributed ~N(0+1Xi;2 ) i μY i ^ μY/X i 0 + 1 Xi • Estimated individual Y for certain Xi equal to its estimated average, also Normally distributed with higher variance ^ Y/Xi~N(0+1Xi;2 ) Yi ^ Y/X1

Estimated Interval of Forcast Values Confidence Interval of mYX, Average Y for certain Xi Interval varies according to the distance to the average X. Estimated Standard error T value from table with df=n-2

Estimated Interval of Forcast Values Confidence Interval of individual Yi for certain Xi The addition of 1 makes this interval is wider than CI of the average Y, µXY

Estimated Interval of Forcast Values for Different X Values Confidence Inteverage Y Confidence Interval for individual Yi Y Ù Yi = b0 + b1Xi X _ Certain Xi X

ANOVA: Analysis of Variance Is the variance of Y can be explained by (variable X in) the Model ? Yi = b0 + b1 Xi + ei Yi = (Y - b1 X) + b1 Xi + ei (Yi – Y) = b1 (Xi – X) + ei (Yi – Y)2 = { b1 (Xi – X) + ei }2 (Yi – Y)2 = { b1 (Xi – X) + ei }2 (Yi – Y)2 = b12(Xi – X)2 + ei2 TSS = RSS + ESS

Measure of Variance: Sum of Squares Y Ù ESS =å(Yi-Yi )2 _ Yi = b0 + b1Xi Ù TSS =å(Yi-Y)2 _ Ù RSS = å(Yi -Y)2 _ Y X Xi

_ Ù RSS = å(Yi -Y)2 Table of ANOVA For Simple Linear Regression Model Ù ESS =å(Yi-Yi )2 _ T SS=å(Yi-Y)2

Inference of the Model: F-test Is the Model can explain the variance of Y? • Statistical Hypothesis • H0: b1 = 0 (model cannot explain) • H1: b1¹ 0 (model can explain) • Test statistic : a = 0.05 F = MSR/MSE ~ F(p, n-1-p) p: number of independent variables 0 6.61 F(1,5) Analysis of Variance Source DF SS MS F P Regression 1 30380456 30380456 81.18 0.000 Residual Error 5 1871200 374240 Total 6 32251656

Residual Analysis for Linearity ü Not Linear Linear e e X X

Residual Analysis for Homoscedasticity ü Heteroscedasticity Homoscedasticity SR SR X X Using Standardized Residuals (SR)

Residual Analysis for Independence of e ü not independent independent SR SR X X

Residual Analysis: Computer Output

Chap 2. SIMPLE LINEAR REGRESSION MODEL

Chap 2. SIMPLE LINEAR REGRESSION MODEL

Presentation Transcript

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple linear regression

The Simple Linear Regression Model

The Simple Linear Regression Model

Simple Linear Regression

Simple Linear Regression

Chapter 2 Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple linear regression

The Simple Linear Regression Model

Simple Linear Regression