Chapter 11: Simple Linear Regression

1 / 57

# Chapter 11: Simple Linear Regression - PowerPoint PPT Presentation

Chapter 11: Simple Linear Regression. Where We’ve Been. Presented methods for estimating and testing population parameters for a single sample Extended those methods to allow for a comparison of population parameters for multiple samples. Where We’re Going.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Chapter 11: Simple Linear Regression' - hong

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Chapter 11: Simple Linear Regression

Where We’ve Been
• Presented methods for estimating and testing population parameters for a single sample
• Extended those methods to allow for a comparison of population parameters for multiple samples

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

Where We’re Going
• Introduce the straight-line linear regression model as a means of relating one quantitative variable to another quantitative variable
• Introduce the correlation coefficient as a means of relating one quantitative variable to another quantitative variable
• Assess how well the simple linear regression model fits the sample data
• Use the simple linear regression model to predict the value of one variable given the value of another variable

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models

The relationship between home runs and runs in

baseball seems at first glance to be deterministic …

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models

But if you consider how many runners are on base when the home run is hit, or even how often the batter misses a base and is called out, the rigid model becomes more variable.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models

General Form of Probabilistic Models

y = Deterministic component + Random error

where y is the variable of interest, and the mean value of the random error is assumed to be 0: E(y) = Deterministic component.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models
• The goal of regression analysis is to find the straight line that comes closest to all of the points in the scatter plot simultaneously.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models
• A First-Order Probabilistic Model

y = 0 + 1x + 

where y = dependent variable

x = independent variable

0 + 1x = E(y) = deterministic component

 = random error component

0 = y – intercept

1 = slope of the line

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models

0, the y – intercept, and 1, theslope of the line, are population parameters, and invariably unknown. Regression analysis is designed to estimate these parameters.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach

Step 1

Hypothesize the deterministic component of the probabilistic model

E(y) = 0 + 1x

Step 2

Use sample data to estimate the unknown parameters in the model

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach

Values on the line are the predicted values

of total offerings given the average offering.

The distances between the scattered dots and

the line are the errors of prediction.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach

Values on the line are the predicted values

of total offerings given the average offering.

The line’s estimated parameters are the values that

minimize the sum of the squared errors of prediction,

and the method of finding those values is called the

method of least squares.

The distances between the scattered dots and

the line are the errors of prediction.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach
• Model:
• Estimates:
• Deviation:
• SSE:

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach
• The least squares line is the line that has the following two properties:
• The sum of the errors (SE) equals 0.
• The sum of squared errors (SSE) is smaller than that for any other straight-line model.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach

Can home runs be used to predict errors?

Is there a relationship

between the number of

home runs a team hits and

the quality of its fielding?

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach

These results suggest that teams which hit more home runs are (slightly) better fielders (maybe not what we expected). There are, however, only five observations in the sample. It is important to take a closer look at the assumptions we made and the results we got.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.3: Model Assumptions

Assumptions

1. The mean of the probability distribution of  is 0.

2. The variance,  2, of the probability distribution of  is constant.

3. The probability distribution of  is normal.

4. The values of  associated with any two values of y are independent.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.3: Model Assumptions
• The variance,  2, is used in every test statistic and confidence interval used to evaluate the model.
• Invariably,  2 is unknown and must be estimated.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.3: Model Assumptions

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1

Note: There may be many different

patterns in the scatter plot when

there is no linear relationship.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1

A critical step in the evaluation of the model is to test whether 1 = 0

. .

. . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . .

y y y

x x x

Positive Relationship No Relationship Negative Relationship

1> 0 1 = 0 1 < 0

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1

H0 : 1 = 0

Ha : 1 ≠ 0

. .

. . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . .

y y y

x x x

Positive Relationship No Relationship Negative Relationship

1> 0 1 = 0 1 < 0

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1
• The four assumptions described above produce a normal sampling distribution for the slope estimate:

called the estimated standard error of the least squares slope estimate.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1
• Since the t-value does not lead to rejection of the null hypothesis, we can conclude that
• A different set of data may yield different results.
• There is a more complicated relationship.
• There is no relationship (non-rejection does not lead to this conclusion automatically).

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1
• Interpreting p-Values for 
• Software packages report two-tailed p-values.
• To conduct one-tailed tests of hypotheses, the reported p-values must be adjusted:

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1
• A Confidence Interval on 1

where the estimated standard error is

and t/2 is based on (n – 2) degrees of freedom

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1
• In the home runs and errors example, the estimated 1was -.0521, and the estimated standard error was .569. With 3 degrees of freedom, t = 3.182.
• The confidence interval is, therefore,

which includes 0, so there may be no relationship between the two variables.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination
• The coefficient of correlation,r, is a measure of the strength of the linear relationship between two variables. It is computed as follows:

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination

Positive linear relationship No linear relationship Negative linear relationship

. . .

. . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . . . . .

. . . .

y y y

x x x

r → +1 r  0 r → -1

Values of requal to +1 or -1 require each point in the scatter plot to lie on a single straight line.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination
• In the example about homeruns and errors, SSxy= -143.8 and SSxx= 2509.
• SSyy is computed as

so

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination
• An r value that close to zero suggests there may not be a linear relationship between the variables, which is consistent with our earlier look at the null hypothesis and the confidence interval on  1.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination
• The coefficient of determination, r2, represents the proportion of the total sample variability around the mean of y that is explained by the linear relationship between x and y.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction
• Based on our model results, a team that hits 140 home runs is expected to make 99.9 errors:

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction
• A 95% Prediction Interval for an Individual Team’s Errors

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction

Prediction intervals for individual new values of y are wider than confidence intervals on the mean of y because of the extra source of error.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction
• Estimating y beyond the range of values associated with the observed values of x can lead to large prediction errors.
• Beyond the range of observed x values, the relationship may look very different.

Estimated relationship

True relationship

Xi Xj

Range of observed

values of x

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.7: A Complete Example
• Step 1
• How does the proximity of a fire house (x) affect the damages (y) from a fire?
• y = f(x)
• y = 0 +1x + 

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.7: A Complete Example

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.7: A Complete Example

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.7: A Complete Example
• Step 2
• The data (found in Table 11.7) produce the following estimates (in thousands of dollars):
• The estimated damages equal \$10,280 + \$4910 for each mile from the fire station, or

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.7: A Complete Example
• Step 3
• The estimate of the standard deviation, , of  is

s = 2.31635

• Most of the observed fire damages will be within 2s 4.64 thousand dollars of the predicted value

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.7: A Complete Example
• Step 4
• Test that the true slope is 0
• SAS automatically performs a two-tailed test, with a reported p-value < .0001. The one-tailed p-value is < .00005, which provides strong evidence to reject the null.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.7: A Complete Example
• Step 4
• A 95% confidence interval on 1 from the SAS output is 4.071 ≤ 1 ≤ 5.768.
• The coefficient of determination, r 2, is .9235.
• The coefficient of correlation, r, is

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.7: A Complete Example
• Suppose the distance from the nearest station is 3.5 miles. We can estimate the damage with the model estimates.

We’re 95% sure the damage for a fire 3.5 miles from the nearest station will be between \$22,324 and \$32,667.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.7: A Complete Example
• Suppose the distance from the nearest station is 3.5 miles. We can estimate the damage with the model estimates.

We’re 95% sure the damage for a fire 3.5 miles from the nearest station will be between \$22,324 and \$32,667.

Since the x-values in our sample range from .7 to 6.1, predictions about y for x-values beyond this range will be unreliable.

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression