# ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION - PowerPoint PPT Presentation Download Presentation ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION

ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Download Presentation ## ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. 3.1 The Association between Two Categorical Variables

2. Response and Explanatory Variables • Response variable(dependent, y) outcome variable • Explanatory variable(independent, x) defines groups • Response/Explanatory • Grade on test/Amount of study time • Yield of corn/Amount of rainfall

3. Association Association – When a value for one variable is more likely with certain values of the other variable Data analysis with two variables • Tell whether there is an association and • Describe that association

4. Contingency Table • Displays two categorical variables • The rows list the categories of one variable; the columns list the other • Entries in the table are frequencies www1.pictures.fp.zimbio.com

5. Contingency Table • What is the response (outcome) variable? Explanatory? • What proportion of organic foods contain pesticides?Conventionally grown? • What proportion of all sampled foods contain pesticides?

6. Proportions & Conditional Proportions

7. Proportions & Conditional Proportions Side by side bar charts show conditional proportions and allow for easy comparison www.vitalchoice.com

8. Proportions & Conditional Proportions If no association, then proportions would be the same Since there isassociation, then proportions are different

9. 3.2 The Association between Two Quantitative Variables

10. Internet Usage & GDP Data Set www.knitwareblog.com

11. Scatterplot Graph of two quantitative variables: • Horizontal Axis: Explanatory, x • Vertical Axis: Response, y

12. Interpreting Scatterplots • The overall pattern includes trend, direction, and strength of the relationship • Trend: linear, curved, clusters, no pattern • Direction: positive, negative, no direction • Strength: how closely the points fit the trend • Also look for outliers from the overall trend

13. Used-car Dealership What association would we expect between the age of the car and mileage? • Positive • Negative • No association

14. Linear Correlation, r Measures the strength and direction of the linear association between x and y

15. Correlation coefficient: Measuring Strength & Direction of a Linear Relationship • Positive r => positive association • Negative r => negative association • r close to +1 or -1 indicates strong linear association • r close to 0 indicates weak association

16. 3.3 Can We Predict the Outcome of a Variable?

17. Regression Line • Predicts y, given x: • The y-intercept and slope are a and b • Only an estimate – actual data vary • Describes relationship between x and estimated meansof y farm4.static.flickr.com

18. Residuals www.chem.utoronto.ca • Prediction errors: vertical distance between data point and regression line • Large residual indicates unusual observation • Each residual is: • Sum of residuals is always zero • Goal: Minimize distance from data to regression line

19. Least Squares Method • Residual sum of squares: • Least squares regression line minimizes vertical distance between points and their predictions msenux.redwoods.edu

20. Regression Analysis Identify response and explanatory variables • Response variable is y • Explanatory variable is x

21. Anthropologists Predict Height Using Remains? • Regression Equation: • is predicted height and x is the length of a femur, thighbone (cm) Predict height for femur length of 50 cm www.geektoysgamesandgadgets.com Bones

22. Interpreting the y-Intercept and slope • y-intercept: y-value when x = 0 • Helps plot line • Slope: change in y for 1 unit increase in x • 1 cm increase in femur length means 2.4 cm increase in predicted height

23. Slope Values: Positive, Negative, Zero

24. Slope and Correlation • Correlation, r: • Describes strength • No units • Same if x and y are swapped • Slope, b: • Doesn’t tell strength • Has units • Inverts if x and y are swapped

25. Squared Correlation, r2 • Proportional reduction in error, r2 • Variation in y-values explained by relationship of y to x • A correlation, r, of .9 means • 81% of variation in y is explained by x

26. 3.4 What Are Some Cautions in Analyzing Associations?

27. Extrapolation • Extrapolation: Predicting y for x-values outside range of data • Riskier the farther from the range of x • No guarantee trend holds Neil Weiss, Elementary Statistics, 7th Edition

28. Outliers and Influential Points • Regression outlier lies far away from rest of data • Influential if both: • Low or high, compared to rest of data • Regression outlier www2.selu.edu

29. Correlation Does Not Imply Causation Strong correlation between x and y means • Strong linear association between the variables • Does not mean x causes y Ex. 95.6% of cancer patients have eaten pickles, so do pickles cause cancer?

30. Lurking Variables & Confounding • Ice cream sales & drowning => temperature • Reading level & shoe size => age • Confounding – two explanatory variables both associated with response variable and each other • Lurking variables – not measured in study but may confound

31. Simpson’s Paradox Example Simpson’s Paradox: • Association between two variables reverses after third is included Probability of Death of Smoker = 139/582 = 24% Probability of Death of Nonsmoker = 230/732 = 31%

32. Simpson’s Paradox Example Break out Data by Age

33. Simpson’s Paradox Example Associations look quite different after adjusting for third variable