1 / 86

# Lesson 11: - PowerPoint PPT Presentation

Lesson 11:. Regressions Part II. Does watching television rot your mind?. Zavodny , Madeline (2006): “ Does watching television rot your mind? Estimates of the effect on test scores ,” Economics of Education Review , 25 ( 5 ) : 565–573.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Lesson 11:' - nathaniel-rich

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lesson 11:

Regressions Part II

• Zavodny, Madeline (2006):“Does watching television rot your mind?Estimates of the effect on test scores,” Economics of Education Review, 25 (5): 565–573

• Television is one of the most omnipresent featuresof Americans’ lives. The average American adultwatches about 15 h of television per week, accountingfor almost one-half of free time.

• The substantial amount of time thatmost individuals spend watching television makes itimportant to examine its effects on society, includinghuman capital accumulation and academicachievement.

• This analysis uses three data sets to examine therelationship between television viewing and testscores: the National Longitudinal Survey of Youth1979 (NLSY), the HSB survey and the NELS. Eachsurvey includes test scores and a question about thenumber of hours of television watched by youngadults.

Test score of individual i at time t

**p<0.01; *p<0.05; †p<0.1

• Relationship Between Variables Is a Linear Function

Random Error

Y intercept

Slope

Y = b0 + b1X1 + b2X2 + b3X3 + … + bkXk + e

Dependent (Response) Variable

Independent (Explanatory) Variable

• It is assumed that rate of return on a stock (R) is linearly related to the rate of return on some factor and the rate of return on the overall market (Rm).

Rit = b0 + boi Rot+ b1Rmt +e

Rate of return on some major stock index

Rate of return on a particular oil company stock i at time t

The rate of return on crude oil price on date t

Estimation by Method of momentsNumber of moment condition needed

Y = b0 + b1X1 + b2X2 + b3X3 + … + bkXk + e

• Assumption #1

• E(e) = 0 implies E(y) – b0 – b1 E(x1) – b2 E(x2) - … bk E(xk)= 0

• Assumption #2

• E(ex1) =0 implies E[(y – b0 – b1x1 - … - bkxk)x1]=0

• Since Cov(e, x1) = E(ex1) – E(e)E(x1) = E(ex1), the assumption really imply e and x are uncorrelated.

• Assumption #3: E(ex2) =0

• Assumption #4: E(ex3) =0

• Assumption #k+1: E(exk) =0

k+1 parameters to estimate. Need k+1 moment conditions.

Estimation of b0, b1, b2,…, bk Method of moments

• Two approaches:

• Solve the b0, b1, b2,…, bk from the k+1 moment conditions, in terms of covariances, variances and means. Plug in to sample analog of these covariances, variances and means ro produce the sample estimate b0, b1, b2,…, bk

• Assume b0, b1, b2,…, bk, solve them from the sample analog of the k+1 moment conditions.

Estimation of b0, b1, b2,…, bk Maximum Likelihood

• Assume ei to be independent identically distributed with normal distribution of zero mean and variance s2. Denote the normal density for e be

• f(e)=f(y-b0-b1x1-b2x2-…-bkxk)

normal density

• Choose b0, b1, b2, …, bk to maximize the joint likelihood:

• L(b0, b1, b2, …, bk) = f(e1)*f(e2)*…*f(en)

f(e)= f(y-b0-b1x1-b2x2-…-bkxk)

To estimate b0 and b1 using ML (Computer)

• We do not know b0, b1, b2, …, bk. Nor do we know ei. In fact, our objective is estimate b0, b1, b2, …, bk.

• The procedure of ML:

• Assume a combination of b0, b1, b2, …, bk, call it b0, b1, b2, …, bk. Compute the implied ei = yi-b0-b1x1i-b2x2i-…-bkxki and f(ei)=f(yi-b0-b1x1i-b2x2i-…-bkxki)

• Compute the joint likelihood conditional on the assumed values of b0, b1, b2, …, bk:

• L(b0, b1, b2, …, bk) = f(e1)*f(e2)*…*f(en)

• Assume many more combination of b0, b1, b2, …, bk, and repeat the above two steps, using a computer program (such as Excel).

• Choose the b0, b1, b2, …, bk that yield a largest joint likelihood.

To estimate b0 and b1 using ML (Calculus)

• The procedure of ML:

• Assume a combination of b0, b1, b2, …, bk, call it b0, b1, b2, …, bk. Compute the implied ei = yi-b0-b1x1i-b2x2i-…-bkxki and f(ei)=f(yi-b0-b1x1i-b2x2i-…-bkxki)

• Compute the joint likelihood conditional on the assumed values of b0, b1, b2, …, bk:

• L(b0, b1, b2, …, bk) = f(e1)*f(e2)*…*f(en)

• Choose b0, b1, b2, …, bk to maximize the likelihood function L(b0, b1, b2, …, bk) – using calculus.

• Take the first derivative of L(b0, b1, b2, …, bk) with respect to b0, set it to zero.

• Take the first derivative of L(b0, b1, b2, …, bk) with respect to bj, set it to zero.

• Solve b0, b1, b2, …, bk using the k+1 equations.

Estimation Ordinary least squares

• For each value of X, there is a group of Y values, and these Y values are normally distributed.

Yi~ N(E(Y|X1, X2,…,Xk), i2), i=1,2,…,n

• The means of these normal distributions of Y values all lie on the straight line of regression.

E(Y|X1, X2,…,Xk) = 0+ 1X1 + 2X2 +… + kXk

• The standard deviations of these normal distributions are equal.

i2= 2 i=1,2,…,n

i.e., homoskedasticity

Choosing the line that fits bestOrdinary Least Squares (OLS) Principle

• Straight lines can be described generally by yi = b0 + b1x1i+ b2x2i +…+ bkxkii=1,…,n

• Finding the best line with smallest sum of squared difference is the same as

Min S(b0,b1) = S[yi – (b0 + b1x1i+ b2x2i +…+ bkxki)]2

• It can be shown the minimization yields the similar sample moment conditions as discussed earlier in the method of moments.

• Best: smallest variance

• Linear: linear combination of yi

• Unbiased: E(b0) = b0, E(b1) = b1

• Estimator

yi = b0 + b1x1i + b2x2i + …+ bkxki + ei

Prediction: y* = b0 + b1x1 + b2x2 + …+ bkxk

• Slope (bj)

• Estimated Y changes by bj for each 1 unit increase in Xj,, holding other variables constant

y* + Dy= b0 + b1x1 + …+ bj(xj+1)+…+ bkxk Dy= bj

More generally,

y* + Dy= b0 + b1x1 + …+ bj(xj+Dxj)+…+ bkxkDy= bjDxj

Dy/Dx = b1

• Y-Intercept (b0 )

• Estimated value of Y when X1 = X2 = … = Xk = 0

You’ve collected the following data:

RespSizeCirc

1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6

• You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00).

y

x1

x2

Parameter Standard T for H0:

Variable DF Estimate Error Param=0 Prob>|T|

INTERCEP 1 0.0640 0.2599 0.246 0.8214

ADSIZE 1 0.2049 0.0588 3.656 0.0399

CIRC 1 0.2805 0.0686 4.089 0.0264

ParameterEstimation Computer Output

• Slope (b1): # Responses to Ad is expected to increase by .2049 (20.49) for each 1 sq. in. increase in Ad SizeHolding Circulation Constant

• Slope (b2): # Responses to Ad is expected to increase by .2805 (28.05) for each 1 unit (1,000) increase in circulationHolding Ad Size Constant

• Assumptions:

• Observed Y values are normally distributed around each estimated value of Y*

• Constant variance

• se measures the dispersion of the points around the regression line

• If se = 0, equation is a “perfect” estimator

• se may be used to compute confidence intervals of the estimated value

• Tests if there is a linear relationship between Xj & Yafter other variables are controlled for.

• Involves population slope bj

• Hypotheses

• H0: bj= 0 (Xj should not appear in the linear relationship)

• H1: bj 0

• Theoretical basis is sampling distribution of slopes

• Let j be a population regression slope and bj its least squares estimate based on n data points. Then, if the standard regression assumptions hold and it can also be assumed that the errors i are normally distributed, the random variable

t= (bj – bj) / Sbj

is distributed as Student’s t with (n – k - 1) degrees of freedom. In addition the central limit theorem enables us to conclude that this result is approximately valid for a wide range of non-normal distributions and large sample sizes, n.

• If the regression errors i , are normally distributed and the standard regression assumptions hold, a 100(1 - )% confidence interval for the population regression slope j is given by

bj - t(n-k-1),a/2 Sbj < bj < bj + t(n-k-1),a/2 Sbj

• Rejecting H0: bj = 0 and concluding that the relationship between xj and y is significant does not enable us to conclude that a cause-and-effect relationship is present between xj and y.

• Causation requires:

• Association

• Accurate time sequence

• Other explanation for correlation

Correlation  Causation

• Just because we are able to reject H0: bj = 0 and demonstrate statistical significance does not enable us to conclude that the relationship between x and y is linear.

• Linear relationship is a very small subset of possible relationship among variables.

• A test of linear versus nonlinear relationship requires another batch of analysis.

EvaluatingtheModel

• Are the assumptions valid?

• Assumption #1: Linearity

• Assumption #2: A set of variables should be included.

• Assumption #3: The explanatory variables are uncorrelated with error term.

• Assumption #4: The error term has a constant variance.

• Assumption #5: The errors are independent of each other.

yi = b0 + b1x1i+ b2x2i + … + bkxki + ei

• Total Sum of Squares (SST)

• Measures variation of observed Yi around the mean,Y

• Explained Variation (SSR)

• Variation due to relationship between X & Y

• Unexplained Variation (SSE)

• Variation due to other factors

• SST=SSR+SSE

SST:

=0, as imposed in the estimation, E(ex)=0.

SSR

SSE

Unexplained Sum of Squares (Yi -Yi*)2

Y

SSE

Yi

Total Sum of Squares (Yi - Y)2

yi* = b0 +b1xi

SST

Explained Sum of Squares (Yi* - Y)2

SSR

Y

X

X

i

• R2 (=r2,the coefficient of determination)measures the proportion of the variation in y that is explained by the variation in x.

• R2 takes on any value between zero and one.

• R2 = 1: Perfect match between the line and the data points.

• R2 = 0: There are no linear relationship between x and y.

• (unadjusted) R-square increases with the number of variables included.

• Thus, using R-square as a measure, we will always conclude a model with more variables are better.

• However, adding a new variables is costly. Additional variable may add to the uncertainty of estimating y.

• Thus, we would like to have a measure that will penalize the addition of variables.

Fix an R2, adjusted R2 decreases with k.

Fix k, adjusted R2 increases with R2.

• Cabolis, Christos, Sofronis Clerides, Ioannis Ioannou and Daniel Senft (2007): “A textbook example of international price discrimination,” Economics Letters, 95(1): 91-95.

• International price comparisons have a long history in economics. Macroeconomists have used themextensively to test for purchasing power parity and the law of one price. International trade economistshave been interested in international price differences as evidence of trade barriers while industrialorganization economists have studied issues of market structure. The popular and business press have alsoshown a keen interest and frequently report intercity price comparisons for standardized products such asthe Big Mac or a Starbucks cappuccino.

• The paper documents the existence of very large differences in the prices of textbooksacross countries.

• Our data were collected from the Internet sites of Amazon.com in two distinct phases. In May 2002we collected information on prices and characteristics of 268 books that were on sale on both the US andUK websites of Amazon, Inc. This data set includes both textbooks and general audience books and werefer to it as our “broad sample”. In December 2002, we collected additional data on economics textbooks;this is our “econ sample”. In this phase, we broadened our sample by including Canada in the search andcollected more detailed information about each book.

• We tested for price differences by running a simple hedonic regression of price on book characteristicsand on dummy variables that aim to capture differences across countries and book types.

Estimates from the board sampledependent variable: ln(p)

Notes: Coefficients that are statistically different from zero at 5% and 1% are marked with “*” and “**” respectively.

Estimates from the Economics sample dependent variable: ln(p)

Notes: Coefficients that are statistically different from zero at 5% and 1% are marked with “*” and “**” respectively.

Key Argument:

• If the value of y does not change linearly with the value of x, then using the mean value of y is the best predictor for the actual value of y. This implies is preferable.

• If the value of y does change linearly with the value of x, then using the regression model gives a better prediction for the value of y than using the mean of y. This implies y=y* is preferable.

• The Global F-test

H0: β1 = β2 = … = βk = 0 (no linear relationship)

H1: at least one βi≠ 0 (at least one independent variable affects Y)

Under the null SSR is either zero or very small!!

Test Statistic:

F is distributed with k numerator degree of freedom and n-k-1 denominator degree of freedom. Reject H0 if F > Fk,n-k-1,a.

[Variation in y] = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model. The null hypothesis should be rejected; thus, the model is valid.

With 2 and 12 degrees of freedom

P-value for the F-Test

H0: β1 = β2 = 0

H1: β1 and β2 not both zero

 = .05

df1= 2 df2 = 12

F-Test for Overall Significance

(continued)

Test Statistic:

Decision:

Conclusion:

Critical Value:

F = 3.885

Since F test statistic is in the rejection region (p-value < .05), reject H0

 = .05

0

F

There is evidence that at least one independent variable affects Y

Do not

reject H0

Reject H0

F.05 = 3.885

• Consider a multiple regression model involving variables xj and zj , and the null hypothesis that the z variable coefficients are all zero:

yi = b0 + b1 x1i + …+ bk xki + a1 z1i + … + ar zri + ei

H0: a1 = a2 = … = ar = 0

H1: at least one of aj≠0 (j=1,…,r)

Under the null SSR due to Z is either zero or very small!!

• Goal: compare the error sum of squares for the complete model with the error sum of squares for the restricted model

• First run a regression for the complete model and obtain SSE

• Next run a restricted regression that excludes the z variables (the number of variables excluded is r) and obtain the restricted error sum of squares SSE(r).

• Compute the F statistic and apply the decision rule for a significance level 

Note: SSE/(n-k-1) = Se2

• A market researcher for Super Dollar Super Markets is studying the yearly amount families of four or more spend on food. Three independent variables are thought to be related to yearly food expenditures (Food). Those variables are: total family income (Income) in \$00, size of family (Size), and whether the family has children in college (College).

Example 1 continued

Note the following regarding the regression equation.

• The variable college is called a dummy or indicator variable. It can take only one of two possible outcomes. That is a child is a college student or not.

• Other examples of dummy variables include

• gender,

• the part is acceptable or unacceptable,

• the voter will or will not vote for the incumbent governor.

• We usually code one value of the dummy variable as “1” and the other “0.”

EXAMPLE 1 continued

EXAMPLE 1 continued

• Use a computer software package, such as Excel, to develop a correlation matrix.

• From the analysis provided by Excel, write out the regression equation:

Y*= 954 +1.09X1 + 748X2 + 565X3

• What food expenditure would you estimate for a family of 4, with no college students, and an income of \$50,000 (which is input as 500)?

EXAMPLE 1 continued

The regression equation is

Food = 954 + 1.09 Income + 748 Size + 565 Student

Predictor Coef SE Coef T P

Constant 954 1581 0.60 0.563

Income 1.092 3.153 0.35 0.738

Size 748.4 303.0 2.47 0.039

Student 564.5 495.1 1.14 0.287

S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1%

Analysis of Variance

Source DF SS MS F P

Regression 3 10762903 3587634 10.94 0.003

Residual Error 8 2623764 327970

Total 11 13386667

EXAMPLE 1 continued

From the regression output we note:

• The coefficient of determination is 80.4 percent. This means that more than 80 percent of the variation in the amount spent on food is accounted for by the variables income, family size, and student.

• Each additional \$100 dollars of income per year will increase the amount spent on food by \$109 per year.

• An additional family member will increase the amount spent per year on food by \$748.

• A family with a college student will spend \$565 more per year on food than those without a college student.

EXAMPLE 1 continued

The estimated food expenditure for a family of 4 with a \$500 (that is \$50,000) income and no college student is \$4,491.

Y* = 954 + 1.09(500) + 748(4) + 565 (0)

= 4491

EXAMPLE 1 continued

• Conduct a global test of hypothesis to determine if any of the regression coefficients are not zero.

• H0 is rejected ifF>4.07.

• From the computer output, the computed value of F is 10.94.

• Decision: H0 is rejected. Not all the regression coefficients are zero

EXAMPLE 1 continued

• Conduct an individual test to determine which coefficients are not zero. This is the hypotheses for the independent variable family size.

• From the computer output, the only significant variable is SIZE (family size) using the p-values. The other variables can be omitted from the model.

• Thus, using the 5% level of significance, reject H0 if the p-value<.05

• A correlation matrix is used to show all possible simple correlation coefficients among the variables.

• See which xj are most correlated with y, and which xj are strongly correlated with each other.

• High correlation between X variables

• Multicollinearity makes it difficult to separate effect of x1 on y from the effect of x2 on y. Leads to unstable coefficients depending on X variables in model

• Always exists – a matter of degree

• Example: using both age & height as explanatory variables in same model

• Examine correlation matrix

• Correlations between pairs of X variables are more than with Y variable

• Few remedies

• Obtain new sample data

• Eliminate one correlated X variable

EXAMPLE 1 continued

• The correlation matrix is as follows:

Food Income Size

Income 0.587

Size 0.876 0.609

Student 0.773 0.491 0.743

• The strongest correlation between the dependent variable and an independent variable is between family size and amount spent on food.

• None of the correlations among the independent variables should cause problems. All are between –.70 and .70.

EXAMPLE 1 continued

• We rerun the analysis using only the significant independent family size.

• The new regression equation is:

Y* = 340 + 1031X2

• The coefficient of determination is 76.8 percent. We dropped two independent variables, and the R-square term was reduced by only 3.6 percent.

Example 1 continued

Regression Analysis: Food versus Size

The regression equation is

Food = 340 + 1031 Size

Predictor Coef SE Coef T P

Constant 339.7 940.7 0.36 0.726

Size 1031.0 179.4 5.75 0.000

S = 557.7 R-Sq = 76.8% R-Sq(adj) = 74.4%

Analysis of Variance

Source DF SS MS F P

Regression 1 10275977 10275977 33.03 0.000

Residual Error 10 3110690 311069

Total 11 13386667

• Purposes

• Evaluate violations of assumptions, including the assumption of linearity.

• Graphical Analysis of Residuals

• Plot residuals versus Xi values

• Difference between actual Yi & predicted Yi*

• Studentized residuals:

• Allows consideration for the magnitude of the residuals

• When the requirement of a constant variance (homoscedasticity) is violated, we have heteroscedasticity.

Using Standardized Residuals (e/se)

OK

Heteroscedasticity

Homoscedasticity

SR

SR

X

X

For example, for xi>xj

Var(ei|xi)>var(ej|xj)

Using Standardized Residuals (e/se)

OK

Not Independent

Independent

SR

SR

X

X

over time indicates that autocorrelation exists.

Residual

Residual

+

+

+

+

+

+

+

+

+

+

+

+

+

+

0

0

Time

Time

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Note the runs of positive residuals,

replaced by runs of negative residuals

Note the oscillating behavior of the

residuals around zero.

• Used when data is collected over time to detect autocorrelation (Residuals in one time period are related to residuals in another period)

• Measures Violation of independence assumption

Should be close to 2.

If not, examine the model for autocorrelation.

Intuition: If x and y are independent, Var(x-y)= Var(x) + Var(y)

• An outlier is an observation that is unusually small or large.

• Several possibilities need to be investigated when an outlier is observed:

• There was an error in recording the value.

• The point does not belong in the sample.

• The observation is valid.

• Identify outliers from the scatter diagram.

• It is customary to suspect an observation is an outlier if its |standard residual| > 2

+

+

+

+

+

+

+

+

+

+

An influential observation

An outlier

+

+

+

… but, some outliers

may be very influential

+

+

+

+

+

+

+

+

+

+

+

+

+

The outlier causes a shift

in the regression line

• Nonnormality or heteroscedasticity can be remedied using transformations on the y variable.

• The transformations can improve the linear relationship between the dependent variable and the independent variables.

• Many computer software systems allow us to make the transformations easily.

• The relationship between the dependent variable and an independent variable may not be linear

• Can review the scatter diagram to check for non-linear relationships

• The second independent variable is the square of the first variable

Model form:

• where:

β0 = Y intercept

β1= regression coefficient for linear effect of X on Y

β2= regression coefficient for quadratic effect on Y

εi = random error in Y for observation i

Y

Y

X

X

X

X

residuals

residuals

Linear fit does not give random residuals

Nonlinear fit gives random residuals

Quadratic models may be considered when the scatter diagram takes on one of the following shapes:

Y

Y

Y

Y

X1

X1

X1

X1

β1 < 0

β1 > 0

β1 < 0

β1 > 0

β2 > 0

β2 > 0

β2 < 0

β2 < 0

β1 = the coefficient of the linear term

β2 = the coefficient of the squared term

• Compare the linear regression estimate

• Hypotheses

• H0: b2=0 (The quadratic term does not improve the model)

• H1: b2≠0 (The quadratic term improves the model)

Hypotheses

• H0: b2=0 (The quadratic term does not improve the model)

• H1: b2≠0 (The quadratic term improves the model)

• The test statistic is

where:

b2 = squared term slope coefficient

β2 = hypothesized slope (zero)

Sb = standard error of the slope

2

Compare Adjusted R2 from simple regression to

• If Adjusted R2 from the quadratic model is larger than Adjusted R2 from the simple model, then the quadratic model is a better model

• Purity increases as filter time increases:

• Simple regression results:y* = -11.283 + 5.985 Time

t statistic, F statistic, and R2 are all high.

But …. the residuals are not random:

y = 1.539 + 1.565 Time + 0.245 (Time)2

^

The quadratic term is significant and improves the model: R2 is higher and se is lower, residuals are now random

Some highly nonlinear models may be transformed into a linear modelThe Log Transformation

• Original multiplicative model

• Transformed multiplicative model

The Multiplicative Model:

Interpretation of coefficients linear model

For the multiplicative model:

• When both dependent and independent variables are logged:

• The coefficient of the independent variable X1 can be interpreted as

A 1 percent change in X1 leads to an estimated b1 percentage change in the average value of Y

• b1 is the elasticity of Y with respect to a change in X1

Note: logY = b0 + b1 logX

b1 = DlogY /DlogX = %DY/%DX

DlogY = logY2 – log Y1 = log(Y2/Y1) = log(1+(Y2-Y1)/Y1) ≈ (Y2-Y1)/Y1

Dummy Variables linear model

• A dummy variable is a categorical independent variable with two levels:

• yes or no, on or off, male or female

• recorded as 0 or 1

• Regression intercepts are different if the variable is significant

• Assumes equal slopes for other variables

• If more than two levels, the number of dummy variables needed is (number of levels - 1)

Dummy variable example linear model

• Intrersted in: Do the average income differ across male and female?

• Compute the average income for female.

• Compute the average income for male.

• Conduct a two sample test of equal mean.

• Alternative approach: regression.

• Y=income

• X1 = 1 if male; 0 if female.

Y= b0 + b1X1 + e

X1 = 0 implies Y = b0 + e

X1 = 1 implies Y = b0 + b1 + e

Test H0: b1=0.

Dummy Variable Example linear model

Let:

y = Pie Sales

x1 = Price

x2 = Holiday (X2 = 1 if a holiday occurred during the week) (X2 = 0 if there was no holiday that week)

Dummy Variable Example linear model

Holiday

No Holiday

Different intercept

Same slope

y (sales)

If H0: β2 = 0 is rejected, then

“Holiday” has a significant effect on pie sales

b0 + b2

Holiday (x2 = 1)

b0

No Holiday (x2 = 0)

x1 (Price)

Interpreting the linear modelDummy Variable Coefficient

Example:

Sales: number of pies sold per week

Price: pie price in \$

Holiday:

1 If a holiday occurred during the week

0 If no holiday occurred

b2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price

Interaction Between Explanatory Variables linear model

• Hypothesizes interaction between pairs of x variables

• Response to one x variable may vary at different levels of another x variable

• Contains two-way cross product terms

Effect of Interaction linear model

• Given:

• Without interaction term, effect of X1 on Y is measured by

• β1

• With interaction term, effect of X1 on Y is measured by

• β1 + β3 X2,

which changes as X2 changes

Interaction Example linear model

Suppose x2 is a dummy variable and the estimated regression equation is

y

12

x2 = 1:

y = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1

^

8

4

x2 = 0:

y = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1

^

0

x1

0

0.5

1

1.5

Slopes are different if the effect of x1 on y depends on x2 value

Significance of Interaction Term linear model

• The coefficient b3 is an estimate of the difference in the coefficient of x1 when x2 = 1 compared to when x2 = 0

• The t statistic for b3 can be used to test the hypothesis

• If we reject the null hypothesis we conclude that there is a difference in the slope coefficient for the two subgroups

Lesson 11: linear model

Regressions Part II

- END -