1 / 53

Regression - PowerPoint PPT Presentation

  • Uploaded on

Regression. Class 21. Schedule for Remainder of Term. Nov. 21: Regression Part I Nov 26: Regression Part II Dec. 03: Moderated Multiple Regression (MMR), Quiz 3 Stats Take-Home Exercise assigned Dec. 05: Survey Questions I & II, but read only Schwartz and

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Regression' - quon-hubbard

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Class 21

Schedule for Remainder of Term

Nov. 21: Regression Part I

Nov 26: Regression Part II

Dec. 03: Moderated Multiple Regression (MMR), Quiz 3

Stats Take-Home Exercise assigned

Dec. 05: Survey Questions I & II, but read only Schwartz and

Schuman & Presser

Dec. 10: Review

Stats Take Home Exercise Due

Dec. 19: Final Exam, Room 302, 11:30 to 2:30

Caveat on Regression Sequence

Regression is complex, rich topic – simple and multiple regression can be a course in itself.

We can cover only a useful introduction in 3 classes.

Will cover:

Simple Regression; basic concepts, elements, SPSS output

Multiple Regression: basic concepts, selection of methods, elements, assumptions SPSS output

Moderated Multiple Regression

Will touch on: Diagnostic stats, outliers, influential cases, cross validation, regression plots, checking assumptions


ANOVA: Do the means of Group A, Group B and Group C differ?

Regression: Does Variable X influence Outcome Y?


Regression vs. ANOVA as Vehicles for Analyzing Data

ANOVA: Sturdy, straightforward, robust to violations, easy to understand inner workings, but limited range of tasks.

Regression: Amazingly versatile, agile, super powerful, loaded with nuanced bells & whistles, but very sensitive to violations of assumptions. A bit more art.

Functions of Regression Analyzing Data

1. Establishing relations between variables

Do frustration and aggression co-occur?

2. Establishing causality between variables

Does frustration (at Time 1) predict aggression (at Time 2)?

3. Testing how multiple predictor variables relate to, or predict, an outcome variable.

Do frustration, and social class, and family stress predict aggression? [additive effects]

4. Test for moderating effects between predictors on outcomes.

Does frustration predict aggression, but mainly for people with low income? [interactive effect]

5. Forecasting/trend analyses

If incomes continue to decline in the future, aggression will increase by X amount.

The Palace Heist: A True-Regression Mystery Analyzing Data

Sterling silver from the royal palace is missing. Why?

Facts gathered during investigation

A. General public given daily tours of palace

B. Reginald, the ADD butler, misplaces things

C. Prince Guido, the playboy heir, has gambling debts

Possible Explanations?

A. Public is stealing the silver

B. Reginald is misplacing the silver

C. Guido is pawning the silver

The Palace Heist: A True-Regression Mystery Analyzing Data

Possible explanations:

A. Public is stealing silver

B. Reginald’s ADD leads to misplaced silver

C. Guido is pawning silver

Is it just one of these explanations, or a combination of them? E.g., Public theft, alone, OR public theft plus Guido’s gambling?

If it is multiple causes, are they equally important or is one

more important than another?

E.g., Crowd size has a significant effect on lost silver, but is

less important than Guido’s debts.

Moderation: Do circumstances interact?

E.g., Does more silver get lost when Reginald’s ADD is severe,

but only when crowds are large?

Regression Can Test Each of These Possibilities, Analyzing Data

And Can Do So Simultaneously








/DEPENDENT missing.silver

/METHOD=ENTER crowds.size


/METHOD=ENTER guido.debts

/METHOD=ENTER crowds.reginald.

Why Do Bullies Harass Other Students? Analyzing Data

Investigation shows that bullies are often:

A. Reprimanded by teachers for unruly behavior

B. Have a lot of family stress

Possible explanations for bullies’ behavior?

A. Frustrated by teachers’ reprimands—take it out on others.

B. Family stress leads to frustration—take it out on others.

Questions based on these possible explanations are:

Is it reprimands alone, or stress alone or reprimands + stress?

Are reprimands important, after considering stress (or vice versa)?

Do reprimands matter only if there is family stress?

Simple Regression Analyzing Data

Features: Outcome, Intercept, Predictor, Error

Y = b0 + b1 + Error (residual)

Do bullies aggress more after being reprimanded?

Y = DV = Aggression

bo = Intercept = average of DV before other variables

are considered.

b1 = slope of IV = influence of being reprimanded.

Elements of Regression Equation Analyzing Data

Y = DV (aggression)

b0 = intercept;

b0 = the average value of DV

BEFORE accounting for IV

b0 = mean DV WHEN IV = 0

B1 = slope

B1= Effect of DV on IV (effect

of reprimands on aggression)

Coefficients = parameters; things that account for Y.

b0 and b1 are coefficients.

Ɛ = error; changes in DV that

are not due to coefficients.

Y = b0 + b1 + Ɛ

1 2 3 4


1 2 3 4 5 6 7 8


Translating Regression Equation Into Expected Outcomes

Y = 2 + 1.0b + Ɛ means that bullies will aggress 2 times a day plus (1 * number of reprimands).

How many times will a bully aggress if he/she is reprimanded 3 times?

1 2 3 4


1 2 3 4 5 6 7 8


Y = 2 + 1.0 (3) = 5

Regression allow one to predict how an individual will behave (e.g., aggress) due to certain causes (e.g., reprimands).

Quick Review of New Terms Expected Outcomes

Y = b0 + b1 + Ɛ

Yis the:

Outcome, aka DV

Intercept; average score when IV = 0

B0 is the:

B1 is the:

Slope, aka predictor, aka IV

Error, aka changes in DV not explained by IV

Ɛ is the:

Intercept and slope(s), B0 and B1

The coefficients include:

Does B0 = mean of the sample?

NO! B0 is expected score ONLY

when slope = 0

If Y = 5.0 + 2.5b + Ɛ, what is Y

when the slope = 2?

5.0 + (2.5 * 2) = 10

Regression In English* Expected Outcomes

The effect that days-per-week meditating has on SAT scores

Y = 1080 + 25b in English?

Students’ SAT is 1080 without meditation, and increases by 25 points for each additional day of weekly meditation.

The effect of Anxiety (in points) on threat detection Reaction Time (in ms)

Y = 878 -15b in English?

Reaction time is 878 ms when anxiety score = 0, and decreases by 15 ms for each 1 pt increase on anxiety measure.

The effect of parents’ hours of out-loud reading on toddlers’ weekly word acquisition.

Y =35 + 8b in English?

Toddlers speak 35 words when parents never read out-loud, and acquires 8 words per week for every hour of out-loud reading.

* Fabricated outcomes

Positive, Negative, and Null Regression Slopes Expected Outcomes

1 2 3 4 5 6 7

1 2 3

Y = 3 + 0

Regression Tests “Models” Expected Outcomes

Model: A predicted TYPE of relationship between one or more IVs (predictors) and a DV (outcome).

Relationships can take various shapes:

Linear: Calories consumed and weight gained.

Curvilinear: Stress and performance

J-shaped: Insult and response intensity

Catastrophic or exponential: Number

words learned and language ability.

Regression Tests How Well the Model “Fits” (Explains) the Obtained Data

Predicted Model: As reprimands increase, bullying will increase.

This is what kind of model?


1 2 3 4


1 2 3 4 5 6 7 8


Linear Regression asks: Do data describe a straight, sloped line?

Do they confirm a linear model?

Locating a "Best Fitting" Regression Line the Obtained Data

* *


* * *

* * *

* * *

* * * *

* * * *

* *

* * *

1 2 3 4 5 6 7 8 9


1 2 3 4 5 6 7 8 9 10 11 12

Line represents the "best fitting slope".

Disparate points represent residuals = deviations from slope.

"Model fit" is based on method of least squares.


Error = the Obtained Data Average Difference Between All Predicted Points (X88 - Ŷ88) and Actual Points (X88 - Y88)

* *

* X88 - Y88 *


* * *

* * *

* * * *

* * * *

* *

* * *


1 2 3 4 5 6 7 8 9 10


X88 - Ŷ88

Note "88" = Subject # 88

1 2 3 4 5 6 7 8 9 10 11 12


Regression Compares Slope to Mean the Obtained Data











1 2 3 4 5 6 7 8 9 10



Null Hyp: Mean score of aggression is best predictor, reprimands unimportant (b1 = 0)

Alt. Hyp: Reprimands explain aggression above and beyond the mean, (b1 > 0)

1 2 3 4 5 6 7 8 9 10 11 12


Observed slope the Obtained Data

Null slope


1 2 3 4 5 6 7 8 9 10

Random slopes, originating at random means

Is observed slope random or meaningful? That's the Regression question.

1 2 3 4 5 6 7 8 9 10 11 12


Total Sum of Squares (SS the Obtained Data T)











1 2 3 4 5 6 7 8 9 10



Total Sum of Squares (SST) = Deviation of each score from DV mean (assuming zero slope), square these deviations, then sum them.

1 2 3 4 5 6 7 8 9 10 11 12


Residual Sum of Squares (SS the Obtained Data R)











1 2 3 4 5 6 7 8 9 10



Residual Sum of Squares (SSR) = Each residual from regression line, square, then sum all these squared residuals.

1 2 3 4 5 6 7 8 9 10 11 12


Elements of Regression the Obtained Data

Total Sum of Squares (SST) = Deviation of each score from the DV mean, square these deviations, then sum them.

Residual Sum of Squares (SSR) = Deviation of each score from the regression line, squared, then sum all these squared residuals.

Model Sum of Squares (SSM) = SST – SSR= The amount that the regression slope explains outcome above and beyond the simple mean.

R2 = SSM / SST= Proportion of model, (i.e. proportion of variance) explained, by the predictor(s). Measures how much of the DV is predicted by the IV (or IVs).

R2 = (SST – SSR) / SST

The Peculiar SS the Obtained Data M

Wait a second! SSM gets bigger only when SSR is smaller!

How does that work?

1. Recall that SSR represents deviations from regression line. As line fits data better, deviations are smaller, and SSR is smaller. So, a smaller SSR is a good thing.

OK, fine. But SSM= SST – SSR , which suggests that it is SST that matters, no? I mean, if SSR is dinky, then all that's left is SST, right?

2. Exactly. Consider opposite case; the slope is no better than the mean at accounting for variation. Then SST = SSR, and SST - SSR = 0.

NOTE: R2 does not tell whether regression is significant (F does that). It is possible to have a small R2 and still have a significant model, if sample is large.

Assessing Overall Model: The Regression the Obtained Data F Test

In ANOVA F = Treatment / Error, = MSB / MSW

In Regression F= Model / Residuals, = MSM / MSR

AKA slope line / random error around slope line

MSM= SSM / df (model) MSR= SSR / df (residual)

df (model) = number of predictors (betas, not counting intercept)

df (residual) = number of observations (i.e., subjects) – estimates (i.e. all betas and intercept). If N = 20, then df = 20 – 2 = 18

F in Regression measures whether overall model does better than chance at predicting outcome.

F the Obtained Data Statistic in Regression

Regression F



Assessing Individual Predictors the Obtained Data

Is the predictor slope significant, i.e. does IV predict outcome?

b1 = slope of sole predictor in simple regression.

If b1 = 0 then change in predictor has zero influence on outcome.

If b1 > 0, then it has some influence. How much greater than 0 must b1 be in order to have significant influence?

t stat tests significance of b1 slope.

b observed – b expected (null effect b; i.e., b = 0)

t =


b observed

t df = n – predictors – 1 = n - 2

t =


Note: Predictors = betas

t the Obtained Data Statistic in Regression

predictor t

sig. of t

B = slope; Std. Error = Std. Error of slope t = B / Std. Error

Beta = Standardized B. Shows how many SDs outcome changes per each SD change in predictor.

Beta allows comparison between predictors, of predictor strength.

Interpreting Simple Regression the Obtained Data

Overall F Test: Our model of reprimand having an effect on aggression is confirmed.

t Test: Reprimands lead to more aggression. In fact, for every 1 reprimand there is a .61 aggressive act, or roughly 1 aggressive act for every 2 reprimands.

Key Indices of Regression the Obtained Data

R =

Degree to which entire model correlates with outcome

R2 =

Proportion of variance model explains

F =

How well model exceeds mean in predicting outcome

b =

The influence of an individual predictor, or set of predictors, at influencing outcome.

beta =

b transformed into standardized units

t of b =

Significance of b (b / std. error of b)

Multiple Regression (MR) the Obtained Data

Y = bo + b1 + b2 + b3 + ……bx + ε

Multiple regression (MR) can incorporate any number of predictors in model.

“Regression plane” with 2 predictors, after that it becomes increasingly difficult to visualize result.

MR operates on same principles as simple regression.

Multiple R = correlation between observed Y and Y as predicted by total model (i.e., all predictors at once).

Two Variables Produce "Regression Plane" the Obtained Data



Family Stress

Multiple Regression Example the Obtained Data

Is aggression predicted by teacher reprimands and family stresses?

Y = bo + b1 + b2 + ε

Y = __


bo = __

Intercept (being a bully, by itself)


b1 = __

family stress

b2 = __

ε = __


Elements of Multiple Regression the Obtained Data

Total Sum of Squares (SST) = Deviation of each score from DV mean, square these deviations, then sum them.

Residual Sum of Squares (SSR) = Each residual from total model (not simple line), squared, then sum all these squared residuals.

Model Sum of Squares (SSM) = SST – SSR = The amount that the total model explains result above and beyond the simple mean.

R2 = SSM / SST= Proportion of variance explained, by the total model.

Adjusted R2 = R2, but adjusted to having multiple predictors

NOTE: Main diff. between these values in mutli. regression and simple regression is use of total model rather than single slope. Math much more complicated, but conceptually the same.

Methods of Regression the Obtained Data

Hierarchical: 1. Predictors selected based on theory or past work 2. Predictors entered into analysis in order of importance, or by established influence. 3. New predictors are entered last, so that their unique contribution can be determined.

Forced Entry: All predictors forced into model simultaneously.

Stepwise: Program automatically searches for strongest predictor, then second strongest, etc. Predictor 1—is best at explaining entire model, accounts for say 40% . Predictor 2 is best at explaining remaining 60%, etc. Controversial method.

In general, Hierarchical is most common and most accepted.

Avoid “kitchen sink” Limit number of predictors to few as possible, and to those that make theoretical sense.

Sample Size in Regression the Obtained Data

Simple rule: The more the better!

Field's Rule of Thumb: 15 cases per predictor.

Green’s Rule of Thumb:

Overall Model: 50 + 8k (k = #predictors)

Specific IV: 104 + k

Unsure which? Use the one requiring larger n

Multiple Regression in SPSS the Obtained Data





/DEPENDENT aggression

/METHOD=ENTER family.stress /METHOD=ENTER reprimands.

“OUTS” refers to variables excluded in, e.g. Model 1

“NOORIGIN” means “do show the constant in outcome report”.

“CRITERIA” relates to Stepwise Regression only; refers to which IVs kept in at Step 1, Step 2, etc.

SPSS Regression Output: Model Effects the Obtained Data

Same as correlation

R = Power of regression

R2= Amount var. explained

Adj. R2 = Corrects for multiple predictors

R sq. change = Impact of each added model

Sig. F Change = does new model explain signif. amount added variance

Requirements and Assumptions the Obtained Data (these apply to Simple and Multiple Regression)

Variable Types: Predictors must be quantitative or categorical (2 values only, i.e. dichotomous); Outcomes must be interval.

Non-Zero Variance: Predictors have variation in value.

No Perfect multicollinearity: No perfect 1:1 (linear) relationship between 2 or more predictors.

Predictors uncorrelated to external variables: No hidden “third variable” confounds

Homoscedasticity: Variance at each level of predictor is constant.

Requirements and Assumptions the Obtained Data (continued)

Independent Errors: Residuals for Sub. 1 ≠ residuals for Sub. 2

Normally Distributed Errors: Residuals are random, and sum to zero (or close to zero).

Independence: All outcome values are independent from one another, i.e., each response comes from a subject who is uninfluenced by other subjects.

Linearity: The changes in outcome due to each predictor are described best by a straight line.

Regression Assumes Errors are the Obtained Data normally, independently, and identically Distributed at Every Level of the Predictor (X)




Assessing Homoscedasticity the Obtained Data

Select: Plots

Enter: ZRESID for Y and ZPRED for X

Ideal Outcome: Equal distribution across chart

Extreme Cases the Obtained Data



Cases that deviate greatly from expected outcome > ± 2.5 can warp regression.

First, identify outliers using Casewise Diagnostics option.

Then, correct outliers per outlier-correction options, which are:











1. Check for data entry error

2. Transform data

3. Recode as next highest/lowest plus/minus 1

4. Delete

Casewise Diagnostics Print-out in SPSS the Obtained Data

Possible problem case

Casewise Diagnostics for Problem Cases Only the Obtained Data

In "Statistics" Option, select Casewise Diagnostics

Select "outliers outside:" and type in how many Std. Dev. you regard as critical. Default = 3

What If Assumption(s) are Violated? the Obtained Data

What is problem with violating assumptions?

Can't generalize obtained model from test sample

to wider population.

Overall, not much can be done if assumptions are substantially

violated (i.e., extreme heteroscedasticity, extreme auto-

correlation, severe non-linearity).

Some options:

1. Heteroscedasticity: Transform raw data (sqr. root, etc.)

2. Non-linearity: Attempt logistic regression

A Word About Regression Assumptions and Diagnostics the Obtained Data

Are these conditions complicated to understand? Yes

Are they laborious to check and correct? Yes

Do most researchers understand, monitor, and address these conditions? No

Even journal reviewers are often unschooled, or don’t take time, to check diagnostics. Journal space discourages authors from discussing diagnostics. Some have called for more attention to this inattention, but not much action.

Should we do diagnostics? GIGO, and fundamental ethics.

Reporting Hierarchical Multiple Regression the Obtained Data

Table 1:

Effects of Family Stress and Teacher Reprimands on Bullying

B SE B β

Step 1

Constant -0.54 0.42

Fam. Stress 0.74 0.11 .85 *

Step 2

Constant 0.71 0.34

Fam. Stress 0.57 0.10 .67 *

Reprimands 0.33 0.10 .38 *

Note: R2 = .72 for Step 1, Δ R2 = .11 for Step 2 (p = .004); * p < .01