regression correlation l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Regression & Correlation PowerPoint Presentation
Download Presentation
Regression & Correlation

Loading in 2 Seconds...

play fullscreen
1 / 40

Regression & Correlation - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

Regression & Correlation. Interval & Ratio Level Association. An Example. Explaining variation in % of state’s 2000 population receiving food stamps. Dependent Variable. State-to-state variation in % of state’s 2000 population receiving food stamps. Independent Variables.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Regression & Correlation' - luisa


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
regression correlation

Regression & Correlation

Interval & Ratio Level Association

an example
An Example
  • Explaining variation in % of state’s 2000 population receiving food stamps
dependent variable
Dependent Variable
  • State-to-state variation in % of state’s 2000 population receiving food stamps
independent variables
Independent Variables
  • Such population characteristics as
    • Income
    • Education
  • Measures of need, e.g.
    • Unemployment rate
    • % living below poverty line
independent variables cont
Independent Variables, cont.
  • Other
    • Teen pregnancies
    • % covered by health insurance
interval ratio data
Interval/Ratio Data
  • Carries a lot of information
  • It may be multiplied & divided
  • It may (in theory) assume an infinite number of values (by going out to the right of the decimal)
  • It is also called “continuous” data
interval ratio data continuous data
Interval/Ratio Data, Continuous Data
  • Can be found in surveys (income, years of education, etc.)
  • Is more commonly found in data sets containing aggregate or ecological data (data which summarizes large numbers of individual cases)
interval ratio data analyzing it
Interval/Ratio Data: Analyzing It
  • Can collapse (recode) it into categories
  • Can use regression and correlation to analyze it directly
regression explaining predicting
Regression: Explaining & Predicting
  • Case scores on independent variable (X) and dependent variable (Y) can be plotted onto a graph, creating a scattergram, or a scatterplot
  • A line (the regression line) can then be drawn through the points on the scattergram, in order to summarize them
regression explaining predicting10
Regression: Explaining & Predicting
  • A regression equation describes a regressionline
  • Simple regression equations have the form:
  • Y’ = a + bXi
y a bx i
Y’ = a + bXi
  • Y’
    • A predicted value of the dependent variable
  • Xi
    • A given value of the independent variable
  • b
    • The slope of the regression line
    • The angle at which the regression line crosses the Y axis
    • a.k.a. the regression coefficient
y a bx i12
Y’ = a + bXi
  • a
    • The “Y intercept”
    • The point at which the regression line crosses the Y axis
    • The value of Y when X is zero
regression explaining predicting cont
Regression: Explaining & Predicting, cont.
  • The line which produces the least amount of error in predicting the dependent variable is the best line (the least squares criterion)
  • The computing formulas used to obtain slopes and intercepts are designed to satisfy this criterion
regression explaining predicting cont14
Regression: Explaining & Predicting, cont.
  • They allow us to predict values of the dependent variable from given values of the independent variable
  • They show how the two variables are related (i.e. they explain the dependent variable’s behavior in terms of the independent variable)
example food stamps teenage mothers
Example: Food Stamps & Teenage Mothers
  • Dependent variable (Y):
    • % of state’s 2000 population receiving food stamps
  • Independent variable (X):
    • % of births to mothers under 20 in 1997
  • Equation:
    • Food stamps % = 1.238 + .396(% of births to mothers under 20)
food stamps teenage mothers cont
Food Stamps & Teenage Mothers, cont.
  • If 15% of a state’s births are to mothers under the age of 20, what percentage of that state’s population would you predict would be receiving Food Stamps?
  • Food Stamps % = 1.238 + . 396(15%)
  • Food Stamps % = 1.238 + 5.94
  • Food Stamps % = 7.178%
food stamps teenage mothers cont17
Food Stamps & Teenage Mothers, cont.
  • If the number of births to mothers under the age of 20 in that state were to decline by 3%, what effect might that have on percentage of population receiving Food Stamps?
  • Food Stamps % = 1.238 - . 396(3%)
  • Food Stamps % = 1.238 – 1.888
  • Food Stamps % = Decrease by 0.05%
food stamps teenage mothers cont18
Food Stamps & Teenage Mothers, cont.
  • Food stamps % = 1.238 + . 396(% of births to mothers under 20)
  • Food Stamps % and births to mothers under 20 are positively associated. As % of births to mothers under 20 decreases, percent of population receiving food stamps also decreases (the positive slope tells us that)
explaining food stamps teenage mothers
Explaining Food Stamps & Teenage Mothers
  • How much percent of population receiving food stamps decreases is indicated by the slope’s size (magnitude) A one percent change in births to mothers under 20 results in a change of (roughly) . 396% in percent of population receiving aid.
slopes
Slopes
  • Are Key, But
  • Their magnitude is affected by both the strength of association between the two variables, and by the magnitude of the independent variable
  • They are not standardized
  • Two slopes may not be easily compared
slopes but
Slopes, but
  • Sometimes we are interested in measuring strength of association, not in explaining &/or predicting
  • To deal with this, we use the correlation coefficient
correlation
Correlation
  • Is a summary association measure for interval/ratio data (used like Cramer’s V, Somer’s D, etc.)
  • Is a standardized slope
  • Is easily calculated
  • Is routinely reported with regression equations
correlation23
Correlation
  • Lots Of Names, One Statistic
    • Pearson’s r
    • Correlation coefficient
    • Pearson’s Product Moment Correlation Coefficient
correlation cont
Correlation, Cont.
  • Is often reported by itself, without bothering to first calculate slopes & intercepts
  • Ranges from -1.0 to 0.0 to 1.0
  • When squared (the coefficient of determination), shows the amount (%) of variation explained
correlation25
Correlation

r2 shows the amount (%) of explained

variation:

r

r2

.30

.09

.50

.25

.608

.37

getting correlations without scattergrams
Getting Correlations Without Scattergrams
  • There is a correlation function in many statistical software packages, and some spreadsheets
  • They will produce a correlation matrix, which shows the correlation of each selected variable with all other selected variables
standard error of estimate
Standard Error of Estimate
  • A “goodness of fit” measure
  • Analogous to standard deviation
  • a range above & below regression line within which 68.2% of all actual cases fall
multiple regression partial correlation
Multiple Regression & Partial Correlation
  • Multivariate analysis for interval & ratio level data
  • Involves the introduction of additional independent variables (controls) into a bivariate association
  • Yields summary statistics that are comparable to those found in simple regression
multiple regression results
Multiple Regression: Results
  • Multiple regression equation
    • Y’ = a + b1X1 + b2X2 + + bnXn
    • Each slope indicates the relationship between its corresponding independent variable and the dependent variable independent of the effect of all other independent variables in the equation
multiple regression equation
Multiple Regression Equation
  • Size of slopes is affected by
    • Strength of association
    • Scale of independent variable(s)
    • Number of independent variables in the equation
multiple regression results cont
Multiple Regression: Results, cont.
  • Multiple correlation coefficient: R2
    • Shows the % of variation in dependent variable explained by all independent variables acting together
  • Significance
example food stamps
Example: Food Stamps
  • Criteria For Assessing Obtained Equation(s)
    • Do a good job of explaining variation in dependent variable (i.e. maximize R2)
  • Keep number of independent variables down to a reasonable minimum, a.k.a.
    • Parsimony
    • Elegance
    • Efficiency
example food stamps33
Example: Food Stamps
  • Selecting Independent Variables
    • Start with a set of interesting variables, then winnow down
  • Considerations:
    • Variables that are (large correlation coefficients) or should be (in theory) strongly associated with the dependent variable are good starting points
example food stamps34
Example: Food Stamps
  • Selecting Independent Variables, cont.
    • Avoid using several independent variables which measure the same concept (strongly correlated with each other, have important theoretic similarities)
    • Try to use independent variables which make significant contributions to the final equation
      • “t” of 2.0 or greater indicates significance
      • Remember, this will change as you add or delete variables
selecting independent variables cont
Selecting Independent Variables, cont.
  • A beta (a standardized slope)
    • Shows the influence of its associated independent variable on the dependent variable, independent of the effects of all other independent variables in the equation
    • Is expressed in standard deviation units
  • Can drop independent variables with small betas (or add ones with large betas), then recompute. This is a form of stepwise regression
resulting equation
Resulting Equation

% Food Stamps = 16.7 +

.343(Teen Moms) -

.157(% HS) -

.103(Health Insurance)

R2 = .443

Prob. = .000

multiple correlation coefficient
Multiple Correlation Coefficient
  • R2
  • Shows the % of variation in dependent variable explained by all independent variables acting together
partial correlation coefficient
Partial Correlation Coefficient
  • rxy.z
  • Shows correlation between dependent variable & a single independent variable, controlling for the effect of a third (fourth, etc.) variable
interpreting partial correlation
Interpreting Partial Correlation

rxy.z 2 shows the amount (%) of variation explained by independent variable, independent of the controls:

rxy.z

rxy.z 2

.30

.09

.50

.25

.185

.43