1 / 13

Lecture 22: Thurs., April 1

Lecture 22: Thurs., April 1. Outliers and influential points for simple linear regression Multiple linear regression Basic model Interpreting the coefficients. Outliers and Influential Observations.

aren
Download Presentation

Lecture 22: Thurs., April 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 22: Thurs., April 1 • Outliers and influential points for simple linear regression • Multiple linear regression • Basic model • Interpreting the coefficients

  2. Outliers and Influential Observations • An outlier is an observation that lies outside the overall pattern of the other observations. A point can be an outlier in the x direction, the y direction or in the direction of the scatterplot. For regression, the outliers of concern are those in the x direction and the direction of the scatterplot. A point that is an outlier in the direction of the scatterplot will have a large residual. • An observation is influential if removing it markedly changes the least squares regression line. A point that is an outlier in the x direction will often be influential. • The least squares method is not resistant to outliers. Follow outlier examination strategy in Display 3.6 for dealing with outliers in x direction and outliers in direction of scatterplot.

  3. Outliers Example • Does the age at which a child begins to talk predict a later score on a test of mental ability? • gesell.JMP contains data on the age at first word (x) and their Gesell Adaptive score (y), an ability test taken much later. • Child 18 is an outlier in the x direction and potentially influential. Child 19 is an outlier in the direction of the scatterplot. • To assess whether a point is influential, fit the least squares line with and without the point (excluding the row to fit it without the point) and see how much of a difference it makes. • Child 18 is influential.

  4. Will You Take Mercury With Your Fish? • Too much mercury in one’s body results in memory loss, depression, irritability and anxiety – the “mad hatter” syndrome. • Rivers and oceans contain small amounts of mercury which can accumulate in fish over their lifetimes. • Concentration of mercury in fish tissue can be obtained at considerable expense by catching fish and sending samples to a lab for analysis. • It is important to understand the relationship between mercury concentration and measurable characteristics of a fish such as length and weight in order to develop safety guidelines about how much fish to eat.

  5. Data Set • mercury.JMP contains data from a study of large mouth bass in the Wacamaw and Lumber rivers in North Carolina. At several stations along each river, a group of fish were caught, weighted, and measured. In addition a filet from each fish caught was sent to the lab so that the tissue concentration of mercury could be determined for each fish. • We want to predict Y=mercury concentration per weight measured in parts per million based on X1=length (centimeters) and X2=weight (measured in grams)

  6. Multiple Regression Model • Multiple Regression: Seeks to estimate the mean of Y given multiple explanatory variables X1,…,Xp, denoted by • Assumptions of ideal multiple linear regression model • (linearity) • (constant variance) • Distribution of Y for each subpopulation X1,…,X p is normally distributed. • The selection of an observation from any of the subpopulations is independent of the selection of any other observation.

  7. Multiple Regression Model: Another Representation • Data: We observe • Ideal Multiple Regression Model • has normal distribution with mean=0, SD= • are independent • = “error” = error from predicting by its subpopulation mean

  8. Estimation of Multiple Linear Regression Model • The coefficients are estimated by choosing to make the sum of squared prediction errors as small as possible, i.e., choose to minimize • Predicted value of y given x1,…,xp: • = SD(Y|X1,…,Xp), estimated by = root mean square error

  9. Multiple Linear Regression in JMP • Analyze, Fit Model • Put response variable in Y • Click on explanatory variables and then click Add under Construct Model Effects • Click Run Model.

  10. Multiple Regression for Mercury Data

  11. Residuals and Root Mean Square Error from Multiple Regression • Residual for observation i = • Root Mean Square Error = • As with simple linear regression, under the ideal multiple linear regression model • Approximately 68% of predictions of a future Y based on will be off by at most • Approximately 95% of predictions of a future Y based on will be off by at most

  12. Interpreting the Coefficients • = increase in mean of Y that is associated with a one unit (1cm) increase in length, holding fixed weight • = increase in mean of Y that is associated with a one unit (1 gram) increase in weight, holding fixed length • Interpretation of multiple regression coefficients depends on what other explanatory variables are in the model. • See handout.

More Related