Create Presentation
Download Presentation

Download Presentation

ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION

Download Presentation
## ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION**Chapter 3**Response and Explanatory Variables**• Response variable(dependent, y) outcome variable • Explanatory variable(independent, x) defines groups • Response/Explanatory • Grade on test/Amount of study time • Yield of corn/Amount of rainfall**Association**Association – When a value for one variable is more likely with certain values of the other variable Data analysis with two variables • Tell whether there is an association and • Describe that association**Contingency Table**• Displays two categorical variables • The rows list the categories of one variable; the columns list the other • Entries in the table are frequencies www1.pictures.fp.zimbio.com**Contingency Table**• What is the response (outcome) variable? Explanatory? • What proportion of organic foods contain pesticides?Conventionally grown? • What proportion of all sampled foods contain pesticides?**Proportions & Conditional Proportions**Side by side bar charts show conditional proportions and allow for easy comparison www.vitalchoice.com**Proportions & Conditional Proportions**If no association, then proportions would be the same Since there isassociation, then proportions are different**Internet Usage & GDP Data Set**www.knitwareblog.com**Scatterplot**Graph of two quantitative variables: • Horizontal Axis: Explanatory, x • Vertical Axis: Response, y**Interpreting Scatterplots**• The overall pattern includes trend, direction, and strength of the relationship • Trend: linear, curved, clusters, no pattern • Direction: positive, negative, no direction • Strength: how closely the points fit the trend • Also look for outliers from the overall trend**Used-car Dealership**What association would we expect between the age of the car and mileage? • Positive • Negative • No association**Linear Correlation, r**Measures the strength and direction of the linear association between x and y**Correlation coefficient: Measuring Strength & Direction of a**Linear Relationship • Positive r => positive association • Negative r => negative association • r close to +1 or -1 indicates strong linear association • r close to 0 indicates weak association**Regression Line**• Predicts y, given x: • The y-intercept and slope are a and b • Only an estimate – actual data vary • Describes relationship between x and estimated meansof y farm4.static.flickr.com**Residuals**www.chem.utoronto.ca • Prediction errors: vertical distance between data point and regression line • Large residual indicates unusual observation • Each residual is: • Sum of residuals is always zero • Goal: Minimize distance from data to regression line**Least Squares Method**• Residual sum of squares: • Least squares regression line minimizes vertical distance between points and their predictions msenux.redwoods.edu**Regression Analysis**Identify response and explanatory variables • Response variable is y • Explanatory variable is x**Anthropologists Predict Height Using Remains?**• Regression Equation: • is predicted height and x is the length of a femur, thighbone (cm) Predict height for femur length of 50 cm www.geektoysgamesandgadgets.com Bones**Interpreting the y-Intercept and slope**• y-intercept: y-value when x = 0 • Helps plot line • Slope: change in y for 1 unit increase in x • 1 cm increase in femur length means 2.4 cm increase in predicted height**Slope and Correlation**• Correlation, r: • Describes strength • No units • Same if x and y are swapped • Slope, b: • Doesn’t tell strength • Has units • Inverts if x and y are swapped**Squared Correlation, r2**• Proportional reduction in error, r2 • Variation in y-values explained by relationship of y to x • A correlation, r, of .9 means • 81% of variation in y is explained by x**Extrapolation**• Extrapolation: Predicting y for x-values outside range of data • Riskier the farther from the range of x • No guarantee trend holds Neil Weiss, Elementary Statistics, 7th Edition**Outliers and Influential Points**• Regression outlier lies far away from rest of data • Influential if both: • Low or high, compared to rest of data • Regression outlier www2.selu.edu**Correlation Does Not Imply Causation**Strong correlation between x and y means • Strong linear association between the variables • Does not mean x causes y Ex. 95.6% of cancer patients have eaten pickles, so do pickles cause cancer?**Lurking Variables & Confounding**• Ice cream sales & drowning => temperature • Reading level & shoe size => age • Confounding – two explanatory variables both associated with response variable and each other • Lurking variables – not measured in study but may confound**Simpson’s Paradox Example**Simpson’s Paradox: • Association between two variables reverses after third is included Probability of Death of Smoker = 139/582 = 24% Probability of Death of Nonsmoker = 230/732 = 31%**Simpson’s Paradox Example**Break out Data by Age**Simpson’s Paradox Example**Associations look quite different after adjusting for third variable