Regression &amp; Correlation

1 / 40

# Regression &amp; Correlation - PowerPoint PPT Presentation

Regression &amp; Correlation. Interval &amp; Ratio Level Association. An Example. Explaining variation in % of state’s 2000 population receiving food stamps. Dependent Variable. State-to-state variation in % of state’s 2000 population receiving food stamps. Independent Variables.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Regression &amp; Correlation' - luisa

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Regression & Correlation

Interval & Ratio Level Association

An Example
• Explaining variation in % of state’s 2000 population receiving food stamps
Dependent Variable
• State-to-state variation in % of state’s 2000 population receiving food stamps
Independent Variables
• Such population characteristics as
• Income
• Education
• Measures of need, e.g.
• Unemployment rate
• % living below poverty line
Independent Variables, cont.
• Other
• Teen pregnancies
• % covered by health insurance
Interval/Ratio Data
• Carries a lot of information
• It may be multiplied & divided
• It may (in theory) assume an infinite number of values (by going out to the right of the decimal)
• It is also called “continuous” data
Interval/Ratio Data, Continuous Data
• Can be found in surveys (income, years of education, etc.)
• Is more commonly found in data sets containing aggregate or ecological data (data which summarizes large numbers of individual cases)
Interval/Ratio Data: Analyzing It
• Can collapse (recode) it into categories
• Can use regression and correlation to analyze it directly
Regression: Explaining & Predicting
• Case scores on independent variable (X) and dependent variable (Y) can be plotted onto a graph, creating a scattergram, or a scatterplot
• A line (the regression line) can then be drawn through the points on the scattergram, in order to summarize them
Regression: Explaining & Predicting
• A regression equation describes a regressionline
• Simple regression equations have the form:
• Y’ = a + bXi
Y’ = a + bXi
• Y’
• A predicted value of the dependent variable
• Xi
• A given value of the independent variable
• b
• The slope of the regression line
• The angle at which the regression line crosses the Y axis
• a.k.a. the regression coefficient
Y’ = a + bXi
• a
• The “Y intercept”
• The point at which the regression line crosses the Y axis
• The value of Y when X is zero
Regression: Explaining & Predicting, cont.
• The line which produces the least amount of error in predicting the dependent variable is the best line (the least squares criterion)
• The computing formulas used to obtain slopes and intercepts are designed to satisfy this criterion
Regression: Explaining & Predicting, cont.
• They allow us to predict values of the dependent variable from given values of the independent variable
• They show how the two variables are related (i.e. they explain the dependent variable’s behavior in terms of the independent variable)
Example: Food Stamps & Teenage Mothers
• Dependent variable (Y):
• % of state’s 2000 population receiving food stamps
• Independent variable (X):
• % of births to mothers under 20 in 1997
• Equation:
• Food stamps % = 1.238 + .396(% of births to mothers under 20)
Food Stamps & Teenage Mothers, cont.
• If 15% of a state’s births are to mothers under the age of 20, what percentage of that state’s population would you predict would be receiving Food Stamps?
• Food Stamps % = 1.238 + . 396(15%)
• Food Stamps % = 1.238 + 5.94
• Food Stamps % = 7.178%
Food Stamps & Teenage Mothers, cont.
• If the number of births to mothers under the age of 20 in that state were to decline by 3%, what effect might that have on percentage of population receiving Food Stamps?
• Food Stamps % = 1.238 - . 396(3%)
• Food Stamps % = 1.238 – 1.888
• Food Stamps % = Decrease by 0.05%
Food Stamps & Teenage Mothers, cont.
• Food stamps % = 1.238 + . 396(% of births to mothers under 20)
• Food Stamps % and births to mothers under 20 are positively associated. As % of births to mothers under 20 decreases, percent of population receiving food stamps also decreases (the positive slope tells us that)
Explaining Food Stamps & Teenage Mothers
• How much percent of population receiving food stamps decreases is indicated by the slope’s size (magnitude) A one percent change in births to mothers under 20 results in a change of (roughly) . 396% in percent of population receiving aid.
Slopes
• Are Key, But
• Their magnitude is affected by both the strength of association between the two variables, and by the magnitude of the independent variable
• They are not standardized
• Two slopes may not be easily compared
Slopes, but
• Sometimes we are interested in measuring strength of association, not in explaining &/or predicting
• To deal with this, we use the correlation coefficient
Correlation
• Is a summary association measure for interval/ratio data (used like Cramer’s V, Somer’s D, etc.)
• Is a standardized slope
• Is easily calculated
• Is routinely reported with regression equations
Correlation
• Lots Of Names, One Statistic
• Pearson’s r
• Correlation coefficient
• Pearson’s Product Moment Correlation Coefficient
Correlation, Cont.
• Is often reported by itself, without bothering to first calculate slopes & intercepts
• Ranges from -1.0 to 0.0 to 1.0
• When squared (the coefficient of determination), shows the amount (%) of variation explained
Correlation

r2 shows the amount (%) of explained

variation:

r

r2

.30

.09

.50

.25

.608

.37

Getting Correlations Without Scattergrams
• There is a correlation function in many statistical software packages, and some spreadsheets
• They will produce a correlation matrix, which shows the correlation of each selected variable with all other selected variables
Standard Error of Estimate
• A “goodness of fit” measure
• Analogous to standard deviation
• a range above & below regression line within which 68.2% of all actual cases fall
Multiple Regression & Partial Correlation
• Multivariate analysis for interval & ratio level data
• Involves the introduction of additional independent variables (controls) into a bivariate association
• Yields summary statistics that are comparable to those found in simple regression
Multiple Regression: Results
• Multiple regression equation
• Y’ = a + b1X1 + b2X2 + + bnXn
• Each slope indicates the relationship between its corresponding independent variable and the dependent variable independent of the effect of all other independent variables in the equation
Multiple Regression Equation
• Size of slopes is affected by
• Strength of association
• Scale of independent variable(s)
• Number of independent variables in the equation
Multiple Regression: Results, cont.
• Multiple correlation coefficient: R2
• Shows the % of variation in dependent variable explained by all independent variables acting together
• Significance
Example: Food Stamps
• Criteria For Assessing Obtained Equation(s)
• Do a good job of explaining variation in dependent variable (i.e. maximize R2)
• Keep number of independent variables down to a reasonable minimum, a.k.a.
• Parsimony
• Elegance
• Efficiency
Example: Food Stamps
• Selecting Independent Variables
• Considerations:
• Variables that are (large correlation coefficients) or should be (in theory) strongly associated with the dependent variable are good starting points
Example: Food Stamps
• Selecting Independent Variables, cont.
• Avoid using several independent variables which measure the same concept (strongly correlated with each other, have important theoretic similarities)
• Try to use independent variables which make significant contributions to the final equation
• “t” of 2.0 or greater indicates significance
• Remember, this will change as you add or delete variables
Selecting Independent Variables, cont.
• A beta (a standardized slope)
• Shows the influence of its associated independent variable on the dependent variable, independent of the effects of all other independent variables in the equation
• Is expressed in standard deviation units
• Can drop independent variables with small betas (or add ones with large betas), then recompute. This is a form of stepwise regression
Resulting Equation

% Food Stamps = 16.7 +

.343(Teen Moms) -

.157(% HS) -

.103(Health Insurance)

R2 = .443

Prob. = .000

Multiple Correlation Coefficient
• R2
• Shows the % of variation in dependent variable explained by all independent variables acting together
Partial Correlation Coefficient
• rxy.z
• Shows correlation between dependent variable & a single independent variable, controlling for the effect of a third (fourth, etc.) variable
Interpreting Partial Correlation

rxy.z 2 shows the amount (%) of variation explained by independent variable, independent of the controls:

rxy.z

rxy.z 2

.30

.09

.50

.25

.185

.43