Warsaw Summer School 2019, OSU S tudy A broad P rogram

Warsaw Summer School 2019, OSUStudy Abroad Program Advanced Topics: Interaction Logistic Regression

Interaction term An interaction means that the effect of one variable is different with respect of other variable. Consider the effects of two variables, Pain and Gender, on Future Orientation, Y. Pain: scale Gender: Male = 1, Female = 0. If we think that Males react differently to Pain and this reaction has different effect on Y than in the case of Females we would estimate the following model: Y = b0 + b1*Pain + b3*Male*Pain + b2*Male

Example Y = b0 + b1*Pain + b3*Male*Pain + b2*Male The equation can be rewritten as: Y = b0 + (b1 + b3*Male)*Pain + b2*Male Now the effect of Pain for Females (Female = 0) is b1 + b3*0 = b1. The effect of Pain for Males (Male = 1) is b1 + b3*1 = b1 + b3.

Interaction • Note that the only '1' for Male*Pain is when you actually have a Male with pain. • Thus, the coefficient associated with this value of 1 will represents a unique effect of pain on Males that is not there on Females. This is an interaction.

Example with dummies Dependent Variable Y = exam scores C = Coffee (yes = 1, no = 0) R = Chocolate (yes = 1, no = 0) Some take C, some R, but some both. In a regression, the "both Coffee and Chocolate" variable would be referred to as "the interaction of Coffee and Chocolate". C*R Regression: Y = 50 + 5C + 10R – 3C*R

Interpretation • If either C or R is zero, C*R equals zero; if both Coffee and Chocolate are 1, then C*R equals one. • That is exactly what we want for comparison of the effect of interaction. • In a regression result, the simplest way to interpret the coefficient of a dummy variable is, "what happens when you change the value from 0 to 1 and leave all the other variables the same.“ • However note that that C*R = 1 implies that C=1 and R=1

Combination of dummies There are four possible combinations for C, R, and CxR:1. C = 0, R = 0, C*R = 02. C = 1, R = 0, C*R = 03. C = 0, R = 1, C*R = 04. C = 1, R = 1, C*R = 1 Interpretation of these situations

Diminishing return Y = 50 + 10C + 20R – 3C*R Even though the coefficient of the interaction is negative, Coffee and Chocolate together might be a positive thing. Taking both Coffee and Chocolate, score 27 points higher! What the -3 is telling is, there are diminishing returns to taking both. You might think that, since Coffee improves you by 10, and Chocolate improves you by 20, that, if you take both, you'll improve by 30. That is not right.

Interpreting Parameters withInteraction Terms An interaction term is a term composed of the product of two characteristics. For example: Income explained by gender and education Interaction term: Female*Education. • Why are interaction terms used? • Different slopes for men and women!

Eg Income = a + b1F + b2educ + b3F*educ The parameter on the interaction term, b3, tells us the difference between the male slope and female slope for income.

Parameters Suppose we estimate parameters using regression for the following two models: • Income = a1 + g*educ for men • Income = a2 + d*educ for women And then we estimate the parameters of a third model on pooled data: • Income= a + b1F + b2educ + b3(F*educ) It turns out that: a = a1 b1 = a2 – a1 b2 = g b3 = d - g

Logistic regressionRegression and dummy DV: I • What we want to predict from a knowledge of relevant independent variables is not a precise numerical value of a dependent variable, but rather the probability (p) that it is 1 (event occurring) rather than 0 (event not occurring). This means that, while in linear regression, the relationship between the dependent and the independent variables is linear, this assumption is not made in logistic regression. Instead, the logistic regression function is use. • Why not to use ordinary regression? The predicted values could become greater than one and less than zero. Such values are theoretically inadmissible.

Regression and dummy DV: II • One of the assumptions of regression is that the variance of Y is constant across values of X. This cannot be the case with a binary variable, because the variance is pq. When 50 percent of the people are 1s, then the variance is .25, its maximum value. As we move to more extreme values, the variance decreases. When P=.10, the variance is .1*.9 = .09, so as P approaches 1 or zero, the variance approaches zero.

Regression and dummy DV: III • The significance testing of the b weights rest upon the assumption that errors of prediction (Y-Y') are normally distributed. Because Y only takes the values 0 and 1, this assumption is pretty hard to justify, even approximately. Therefore, the tests of the regression weights are suspect if you use linear regression with a binary DV.

Odds and log odds • Suppose we only know a person's education and we want to predict whether that person voted (1) or not voted (0) in the last election. We can talk about the probability of voting, or we can talk about the odds of voting. Let's say that the probability of voting at a given education is .90. Then the odds would be • Odds = p / 1 – p or Odds = p / q where q = 1 - p • (Odds can also be found by counting the number of people in each group and dividing one number by the other. Clearly, the probability is not the same as the odds.)

Odds and log odds • In our example, the odds would be .90/.10 or 9 to one. Now the odds of not voting would be .10/.90 or 1/9 or .11. This asymmetry is unappealing, because the odds of voting should be the opposite of the odds of not votiong. • We can take care of this asymmetry though the natural logarithm, ln. • The natural log of 9 is 2.217 (ln(.9/.1)=2.217). The natural log of 1/9 is -2.217 (ln(.1/.9)=-2.217), so the log odds of voting is exactly opposite to the log odds of not voting.

Natural logarithm • The natural logarithm is the logarithm to the base e, where e is a constant approximately equal to 2.7. The natural logarithm is generally written as ln(x), or sometimes, if the base of e is implicit, as log(x). • The natural logarithm of a number x (written as ln(x)) is the power to which e would have to be raised to equal x. For example, ln(7.389...) is 2, because e2=7.389.... The natural log of e itself (ln(e)) is 1 because e1 = e, while the natural logarithm of 1 (ln(1)) is 0, since e0 = 1.

Ln • Note that the natural log is zero when X is 1. When X is larger than one, the log curves up slowly. When X is less than one, the natural log is less than zero, and decreases rapidly as X approaches zero. When P = .50, the odds are .50/.50 or 1, and ln(1) =0. If P is greater than .50, ln(P/(1-P) is positive; if P is less than .50, ln(odds) is negative. [A number taken to a negative power is one divided by that number, e.g. e-10 = 1/e10. A logarithm is an exponent from a given base, for example ln(e10) = 10.]

Logistic regression Ln(p / 1 – p) = a + B1*X1 + B2 *X2

Warsaw Summer School 2019, OSU S tudy A broad P rogram