320 likes | 391 Views
Learn how to use regression analysis to predict bird population changes in a colony, with detailed examples and formulas. Explore the concept of least squares regression and its application to real-life scenarios.
E N D
Chapter 5 Regression Chapter 5
Objectives of Regression • To describe the change in Y per unit X • To predict the average level of Y at a given level of X Chapter 5
“Returning Birds” Example Plot data first to see if relation can be described by straight line (important!) Illustrative data from Exercise 4.4 Y = adult birds joining colony X = percent of birds returning, prior year Chapter 5
If data can be described by straight line • … describe relationship with equation Y = (intercept) + (slope)(X) • May also be written: Y = (slope)(X) + (intercept) Intercept where line crosses Y axis Slope “angle” of line Chapter 5
Linear Regression • Algebraic line every point falls on line:exact y = intercept + (slope)(X) • Statistical line scatter cloud suggests a linear trend: “predictedy” = intercept + (slope)(X) Chapter 5
Regression Equation ŷ = a + bx, where • ŷ (“y-hat”) is the predicted value of Y • a is the intercept • b is the slope • x is a value for X • Determine a & b for “best fitting line” The TI calculators reverse a & b! Chapter 5
What Line Fits Best? If we try to draw the line by eye, different people will draw different lines We need a method to draw the “best line” This method is called “least squares” Chapter 5
The “least squares” regression line Each point has: Residual = observed y– predicted y = distance of point from prediction line The least squares line minimizes the sum of the square residuals Chapter 5
Calculating Least Squares Regression Coefficients • Formula (next slide) • Technology • TI-30XIIS • Two variable Applet • Other Chapter 5
Formulas • b = slope coefficient • a = intercept coefficient where sx and sy are the standard deviations of the two variables, and r is their correlation Chapter 5
Technology: Calculator BEWARE! TI calculators label the slope and intercept backwards! Chapter 5
Regression Line • For the “bird data”: • a = 31.9343 • b = 0.3040 • The linear regression equation is: ŷ = 31.9343 0.3040x The slope (-0.3040) represents the average change in Y per unit X Chapter 5
Use of Regression for Prediction Suppose an individual colony has 60% returning (x = 60). What is the predicted number of new birds for this colony? Answer: ŷ = a + bx = 31.9343 (0.3040)(60) = 13.69 Interpretation: the regression model predicts 13.69 new birds (ŷ) for a colony with x = 60. Chapter 5
Prediction via Regression Line Number of new birds and Percent returning When X = 60, the regression model predicts Y = 13.69 Chapter 5
Case Study Per Capita Gross Domestic Product and Average Life Expectancy for Countries in Western Europe Chapter 5
Regression CalculationCase Study Chapter 5
Life Expectancy and GDP (Europe) Chapter 5
Regression Calculationby Hand (Life Expectancy Study) Calculations: ŷ= 68.716 + 0.420x Chapter 5
BPS/3e Two Variable Applet Chapter 5
Applet: Data Entry Chapter 5
Applet: Calculations Chapter 5
Applet: Scatterplot Chapter 5
Applet: least squares line Chapter 5
InterpretationLife Expectancy Case Study • Model: ŷ= 68.716 + (0.420)X • Slope: For each increase in GDP 0.420 years increase in life expectancy • Prediction example: What is the life expectancy in a country with a GDP of 20.0?ANSWER: ŷ= 68.716 + (0.420)(20.0) = 77.12 Chapter 5
Coefficient of Determination (R2)(Fact 4 on p. 111) • “Coefficient of determination, (R2) Quantifies the fraction of the Y “mathematically explained” by X Examples: • r=1:R2=1: regression line explains all (100%) of the variation in Y • r=.7: R2=.49: regression line explains almost half (49%) of the variation in Y Chapter 5
We are NOT going to cover the analysis of residual plots (pp. 113-116) Chapter 5
Outliers and Influential Points • An outlier is an observation that lies far from the regression line • Outliers in the ydirection have large residuals • Outliers in the x direction are influential • removal of influential point would markedly change the regression and correlation values Chapter 5
After removing child 18 From all the data Outliers:Case Study Gesell Adaptive Score and Age at First Word r2 = 11% r2 = 41% Chapter 5
CautionsAbout Correlation and Regression • Describe only linear relationships • Are influenced by outliers • Cannot be used to predict beyond the range of X (do not extrapolate) • Beware of lurking variables (variables other than X and Y) • Association does not always equal causation! Chapter 5
Do not extrapolate (Sarah’s height) • Sarah’s height is plotted against her age • Can you predict her height at age 42 months? • Can you predict her height at age 30 years (360 months)? Chapter 5
Do not extrapolate (Sarah’s height) • Regression equation:ŷ= 71.95 + .383(X) • At age 42 months: ŷ = 71.95 + .383(42) = 88 (Reasonable) • At age 360 months: ŷ = 71.95 + .383(360) = 209.8 (That’s over 17 feet tall!) Chapter 5
Caution: Correlation does not always mean causation Even very strong correlations may not correspond to a causal relationship between x and y (Beware of the lurking variable!) Chapter 5