slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Simple Linear Regression PowerPoint Presentation
Download Presentation
Simple Linear Regression

Loading in 2 Seconds...

play fullscreen
1 / 40

Simple Linear Regression - PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on

Simple Linear Regression. Least squares line Interpreting coefficients Prediction Cautions The formal model. Section 2.6, 9.1, 9.2. Professor Kari Lock Morgan Duke University. Exam 2 Grades. In Class:. Lab:. Total:. Comments on In-Class Exam.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Simple Linear Regression' - sheng


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Simple Linear Regression

  • Least squares line
  • Interpreting coefficients
  • Prediction
  • Cautions
  • The formal model
  • Section 2.6, 9.1, 9.2
  • Professor Kari Lock Morgan
  • Duke University
slide2

Exam 2 Grades

In Class:

Lab:

Total:

slide3

Comments on In-Class Exam

  • Test whether this data provides evidence that Melanoma is found significantly more often on the left side of the body: one categorical variable -> single proportion
  • 2011 Hollywood movies: If the sample is the same as the population, then no need for inference!
  • Standard deviation of a bootstrap distribution is the standard error
slide4

Comments on Lab Exam

  • Most common reason for points off: applying the wrong method
  • The first step should ALWAYS be asking yourself: What is/are the variable(s)? Are they categorical or quantitative?
  • Always plot/visualize your data. Outliers can strongly affect the results; you should either explain why they are left in, or else remove them
slide5

Simulation Methods

  • For any one or two variables, resample( ) gives a confidence interval
  • For any two variables, reallocate( ) tests for an association between the variables
  • No conditions to check!
  • Automatically deals with missing data!
  • Only two commands to remember!
  • No distributions to remember!
slide7

Crickets and Temperature

  • Can you estimate the temperature on a summer evening, just by listening to crickets chirp?

Response Variable, y

We will fit a model to predict temperature based on cricket chirp rate

Explanatory

Variable, x

slide8

Linear Model

  • A linear modelpredicts a response variable, y, using a linear function of explanatory variables
  • Simple linear regressionpredicts on response variable, y, as a linear function of one explanatory variable, x
  • We will create a model that predicts temperature as a linear function of cricket chirp rate
slide9

Regression Line

  • Goal: Find astraight line that best fits the data in a scatterplot
slide10

Predicted and Actual Values

  • The actual response value, y, is the response value observed for a particular data point
  • The predicted response value, , is the response value that would be predicted for a given x value, based on a model
  • In linear regression, the predicted values fall on the regression line directly above each x value
  • The best fitting line is that which makes the predicted values closest to the actual values
slide12

Residual

  • The residual for each data point is
  • or the vertical distance from the point to the line

Want to make all the residuals as small as possible.

How would you measure this?

slide13

Least Squares Regression

  • Least squares regressionchooses the regression line that minimizes the sum of the squared residuals
slide15

Equation of the Line

  • The estimated regression line is
  • Slope: increase in predicted y for every unit increase in x
  • Intercept: predicted y value when x = 0

Intercept

Slope

slide16

Regression in R

  • > lm(Temperature~Chirps)
  • Call:
  • lm(formula = Temperature ~ Chirps)
  • Coefficients:
  • (Intercept) Chirps
  • 37.6786 0.2307
slide17

Regression Model

  • Which is a correct interpretation?
    • The average temperature is 37.69
    • For every extra 0.23 chirps per minute, the predicted temperate increases by 1 degree
    • Predicted temperature increases by 0.23 degrees for each extra chirp per minute
    • For every extra 0.23 chirps per minute, the predicted temperature increases by 37.69
slide18

Units

  • It is helpful to think about units when interpreting a regression equation

y units

y units

x units

degrees

degrees/

chirps per min

chirps per minute

degrees

slide19

Prediction

  • The regression equation can be used to predict y for a given value of x
  • If you listen and hear crickets chirping about 140 times per minute, your best guess at the outside temperature is
slide21

Prediction

  • If the crickets are chirping about 180 times per minute, your best guess at the temperature is
    • 60
    • 70
    • 80
slide22

Exam Scores

  • Calculate your residual.
slide23

Prediction

  • The intercept tells us that the predicted temperature when the crickets are not chirping at all is 37.69. Do you think this is a good prediction?
  • (a) Yes
  • (b) No
slide24

Regression Caution 1

  • Do not use the regression equation or line to predict outside the range of x values available in your data (do not extrapolate!)
  • If none of the x values are anywhere near 0, then the intercept is meaningless!
slide25

Duke Rank and Duke Shirts

  • Are the rank of Duke among schools applied to and the number of Duke shirts owned
  • positively associated
  • negatively associated
  • not associated
  • other
slide26

Duke Rank and Duke Shirts

  • Are the rank of Duke among schools applied to and the number of Duke shirts owned
  • positively associated
  • negatively associated
  • not associated
  • other
slide27

Regression Caution 2

  • Computers will calculate a regression line for any two quantitative variables, even if they are not associated or if the association is not linear
  • ALWAYS PLOT YOUR DATA!
  • The regression line/equation should only be used if the association is approximately linear
slide28

Regression Caution 3

  • Outliers (especially outliers in both variables) can be very influential on the regression line
  • ALWAYS PLOT YOUR DATA!
  • http://illuminations.nctm.org/LessonDetail.aspx?ID=L455
slide29

Life Expectancy and Birth Rate

Which of the following interpretations is correct?

A decrease of 0.89 in the birth rate corresponds to a 1 year increase in predicted life expectancy

Increasing life expectancy by 1 year will cause the birth rate to decrease by 0.89

Both

Neither

Coefficients:

(Intercept) LifeExpectancy 83.4090 -0.8895

slide30

Regression Caution 4

  • Higher values of x may lead to higher (or lower) predicted values of y, but this does NOT mean that changing x will cause y to increase or decrease
  • Causation can only be determined if the values of the explanatory variable were determined randomly (which is rarely the case for a continuous explanatory variable)
slide31

Explanatory and Response

  • Unlike correlation, for linear regression it does matter which is the explanatory variable and which is the response
slide32

r = 0

  • Challenge: If the correlation between x and y is 0, what would the regression line be?
slide33

Simple Linear Model

  • The population/true simple linear model is
  • 0 and 1, are unknown parameters
  • Can use familiar inference method!

Intercept

Slope

Random error

slide34

Inference for the Slope

  • Confidence intervals and hypothesis tests for the slope can be done using the familiar formulas:
  • Population Parameter: 1, Sample Statistic:
  • Use t-distribution with n – 2 degrees of freedom
slide35

Inference for Slope

Give a 95% confidence interval for the true slope.

Is the slope significantly different from 0?

(a) Yes

(b) No

slide36

Confidence Interval

> qt(.975,5) [1] 2.570582

We are 95% confident that the true slope, regressing temperature on cricket chirp rate, is between 0.194 and 0.266 degrees per chirp per minute.

slide37

Hypothesis Test

> 2*pt(16.21,5,lower.tail=FALSE)

[1] 1.628701e-05

There is strong evidence that the slope is significantly different from 0, and that there is an association between cricket chirp rate and temperature.

slide38

Small Samples

  • The t-distribution is only appropriate for large samples (definitely not n = 7)!
  • We should have done inference for the slope using simulation methods...
slide39

If results are very significant, it doesn’t really matter if you get the exact p-value… you come to the same conclusion!

slide40

Project 2

  • Details here
  • Group project on regression (modeling)
  • If you want to change groups, email me TODAY OR TOMORROW. If other people in your lab section want to change, I’ll move people around.
  • Need a data set with a quantitative response variable and multiple explanatory variables; explanatory variables must have at least one categorical and at least one quantitative
  • Proposal due next Wednesday