Lecture 20
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Lecture 20 PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 20. Simple linear regression (18.6, 18.9) Homework 5 is posted and is due next Tuesday at 3 p.m. (Note correction on question 4(e)). Regular office hours: Tuesday, 9-10, 12-1. Extra office hours: Today (after class), Monday, 10-11. Midterm 2: Wednesday, April 2 nd , 6-8 p.m.

Download Presentation

Lecture 20

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 20

Lecture 20

  • Simple linear regression (18.6, 18.9)

  • Homework 5 is posted and is due next Tuesday at 3 p.m. (Note correction on question 4(e)).

  • Regular office hours: Tuesday, 9-10, 12-1.

  • Extra office hours: Today (after class), Monday, 10-11.

  • Midterm 2: Wednesday, April 2nd, 6-8 p.m.


Point prediction

A point prediction

Point Prediction

  • Example 18.7

    • Predict the selling price of a three-year-old Taurus with 40,000 miles on the odometer (Example 18.2).

  • It is predicted that a 40,000 miles car would sell for $14,575.

  • How close is this prediction to the real price?


Interval estimates

  • The prediction interval

  • The confidence interval

Interval Estimates

  • Two intervals can be used to discover how closely the predicted value will match the true value of y.

    • Prediction interval – predicts y for a given value of x,

    • Confidence interval – estimates the average y for a given x.


Interval estimates example

Interval Estimates,Example

  • Example 18.7 - continued

    • Provide an interval estimate for the bidding price on a Ford Taurus with 40,000 miles on the odometer.

    • Two types of predictions are required:

      • A prediction for a specific car

      • An estimate for the average price per car


Interval estimates example1

Interval Estimates,Example

  • Solution

    • A prediction interval provides the price estimate for a single car:

t.025,98

Approximately


Interval estimates example2

Interval Estimates,Example

  • Solution – continued

    • A confidence interval provides the estimate of the mean price per car for a Ford Taurus with 40,000 miles reading on the odometer.

      • The confidence interval (95%) =


The effect of the given x g on the length of the interval

The effect of the given xg on the length of the interval

  • As xg moves away from x the interval becomes longer. That is, the shortest interval is found at


The effect of the given x g on the length of the interval1

The effect of the given xg on the length of the interval

  • As xg moves away from the interval becomes longer. That is, the shortest interval is found at


The effect of the given x g on the length of the interval2

The effect of the given xg on the length of the interval

  • As xg moves away from the interval becomes longer. That is, the shortest interval is found at .


Caveat about prediction

Caveat about Prediction

  • Remember that predicting y based on x from a regression is only reliable if x falls inside the range of the data observed.

  • Extrapolation is dangerous.


Predicting height based on age

Predicting Height Based on Age


18 9 regression diagnostics i

18.9 Regression Diagnostics - I

  • The four conditions required for the validity of the simple linear regression analysis are:

    • the mean of the error variable conditional on x is zero for each x

    • the error variable is normally distributed.

    • the error variance is constant for all values of x.

    • the errors are independent of each other.

  • How can we diagnose violations of these conditions?


Residual analysis

Residual Analysis

  • Examining the residuals helps detect violation of the required conditions

  • A residual plot is a scatterplot of the regression residuals against another variable, usually the independent variable or time.

  • If the simple linear regression model holds, there should be no pattern in the residual plots.

  • Don’t read too much into these plots. You’re looking for gross departures from a random scatter.


Residual plot for utopia jmp

Residual plot for utopia.jmp

  • Utopia.jmp is a simulation from a simple linear regression model (all assumptions hold).


Residual plot for example 18 2

Residual Plot for Example 18.2


Detecting curvature

Detecting Curvature

  • If the residual plot has a curved pattern, this indicates that the regression function is not a straight line.

  • Transformations to deal with the problem of a curved regression function rather than a straight line regression function later in the lecture.


Heteroscedasticity

^

y

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

^

The spread increases with y

Heteroscedasticity

  • When the requirement of a constant variance is violated we have a condition of heteroscedasticity.

  • Diagnose heteroscedasticity by plotting the residual against the predicted y.

Residual

+

+

+

+

+

+

+

+

+

+

+

+

+

^

+

+

+

y

+

+

+

+

+

+

+

+


Homoscedasticity

Homoscedasticity

  • When the requirement of a constant variance is not violated we have a condition of homoscedasticity.

  • Example 18.2 - continued


Residual plot for cleaning jmp

Residual plot for cleaning.jmp


Non independence of error variables

Non Independence of Error Variables

  • A time series is constituted if data were collected over time.

  • Examining the residuals over time, no pattern should be observed if the errors are independent.

  • When a pattern is detected, the errors are said to be serially correlated (or autocorrelated)

  • Serial correlation can be detected by graphing the residuals against time.


Lecture 20

Non Independence of Error Variables

Patterns in the appearance of the residuals over time indicates that autocorrelation exists.

Residual

Residual

+

+

+

+

+

+

+

+

+

+

+

+

+

+

0

0

+

Time

Time

+

+

+

+

+

+

+

+

+

+

+

+

+

Note the runs of positive residuals,

replaced by runs of negative residuals.

Positive serial correlation.

Note the oscillating behavior of the

residuals around zero.

Negative serial correlation.


Checking normality

Checking normality

  • To check the normality of the error variable, draw a histogram of the residuals.

  • Violation of normality only has a serious effect on confidence intervals and tests if the sample size is small (less than 30) and there is either strong skewness or outliers.


Outliers

Outliers

  • An outlier is an observation that is unusually small or large.

  • Three types of outliers in scatterplots:

    • Outlier in x direction

    • Outlier in y direction

    • Outlier in overall direction of scatterplot (residual has large magnitude)

  • Several possibilities need to be investigated when an outlier is observed:

    • There was an error in recording the value.

    • The point does not belong in the sample.

    • The observation is valid.

  • Identify outliers from the scatterplot


Leverage and influential points

Leverage and Influential Points

  • An observation has high leverage if it is an outlier in the x direction.

  • An observation is influential if removing it would markedly change the least squares line.

  • Observations that have high leverage are influential if they do not fall very close to the least squares line for the other points.


Lecture 20

+

+

+

+

+

+

+

+

+

+

+

An influential observation

An outlier

+

+

… but, some outliers

may be very influential

+

+

+

+

+

+

+

+

+

+

+

+

+

+

The outlier causes a shift

in the regression line


Regression of brain weight on body weight for 96 mammals

Regression of Brain Weight on Body Weight for 96 Mammals


Transformations

Transformations

  • Suppose that the residual plot indicates curvature in the regression function. What do we do?

  • One possibility: Transform x or transform y.

  • Tukey’s Bulging Rule (see Handout).


Transformation for display jmp

Transformation for display.jmp

  • Y=Sales, X=Display Feet

  • Y=Sales, X=Square Root of Display Feet/Log of Display Feet


Predictions with transformations

Predictions with Transformations

  • Linear Fit

  • Sales = -46.28718 + 154.90188 Square Root DisplayFeet

  • For 5 display feet, the average amount of sales is


18 6 finance application market model

18.6 Finance Application: Market Model

  • One of the most important applications of linear regression is the market model.

  • It is assumed that rate of return on a stock (R) is linearly related to the rate of return on the overall market.

    R = b0 + b1Rm +e

Rate of return on a particular stock

Rate of return on some major stock index

The beta coefficient measures how sensitive the stock’s rate

of return is to changes in the level of the overall market.


Example 18 6

Example 18.6


  • Login