1 / 19

Lecture 16 – Thurs., March 4

Lecture 16 – Thurs., March 4. Chi squared test for M&M experiment Simple linear regression (Chapter 7.2) Next class after spring break: Inference for simple linear regression (Chapter 7.3-7.4). Chi-squared test for M &M experiment. Data in MandM.JMP.

randilyn
Download Presentation

Lecture 16 – Thurs., March 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 16 – Thurs., March 4 • Chi squared test for M&M experiment • Simple linear regression (Chapter 7.2) • Next class after spring break: Inference for simple linear regression (Chapter 7.3-7.4)

  2. Chi-squared test for M &M experiment • Data in MandM.JMP. • According to the M&M’s web site, the color distribution in peanut butter M&M’s is 20% brown, 20% yellow, 20% red, 20% blue, 10% green and 10% orange. Test

  3. Regression Analysis: Motivating Example • Heart catheterization is performed on children with congenital heart defects. A Teflon tube (catheter) is passed into a major vein or artery and pushed into heart to obtain information about heart’s physiology and functional ability. • It would be desirable to accurately predict the needed length of catheter (Y) based on the height (X) of the child [Weindling, 1977]. • In a small study of 12 children, the exact catheter length required was determined by using a fluoroscope to check that tip of catheter had reached pulmonary artery.

  4. Predicting Y based on X • For each X=height, there is a population of children with height X. • What is a good prediction of Y=catheter length required if we know that a child’s height is X (e.g., 48)? • A good prediction is the mean catheter length required for the population of children with height X, (e.g., mean catheter length required for population of children with height 48, ).

  5. Regression Analysis • The goal of regression analysis is to estimate the mean of Y for the population with characteristic X, [sometimes called the mean of Y given X or conditional mean of Y given X] • Simple regression: There is only one characteristic X. • Multiple regression: There are several characteristics [e.g., child’s height and weight] • The Y variable that we want to predict is called the response variable. The X variables that we use to make the prediction are called the explanatory variables or predictor variables.

  6. Simple Linear Regression Model • Simple linear regression model: The mean of Y given X is a straight line – (this is called the regression line) • = Intercept. The mean of Y given X=0. • = Slope. The amount by which the mean of Y given X increases for each one unit increase in X. • Example: Suppose for catheter data. For each additional inch of height, the mean catheter length required increases by 0.6 cm.

  7. Estimating the coefficients • We want to make the predictions of Y based on X as good as possible. The best prediction of Y based on X is • Least Squares Method: Choose coefficients to minimize the sum of squared prediction errors. • Fitted value for observation i is its estimated mean given X: • Residual for observation i is the prediction error of using to predict : • Least squares method: Find estimates that minimize the sum of squared residuals, solution on page 182.

  8. Regression Analysis in JMP • Use Analyze, Fit Y by X. Put response variable in Y and explanatory variable in X (make sure X is continuous). • Click on fit line under red triangle next to Bivariate Fit of Y by X.

  9. Predicting Y based on X • Least squares regression line: • The estimated mean cathether length required for children who are 4 feet (48 inches tall) is • A good prediction of the catheter length required for a child who is 4 feet tall is cm.

  10. Ideal Simple Linear Regression Model • Assumptions of ideal simple linear regression model • There is a normally distributed subpopulation of responses for each value of the explanatory variable • The means of the subpopulations fall on a straight-line function of the explanatory variable. • The subpopulation standard deviations are all equal (to ) • The selection of an observation from any of the subpopulations is independent of the selection of any other observation.

  11. The standard deviation • is the standard deviation in each subpopulation. • measures how accurate the predictions of y based on x from the regression will be. • If the simple linear regression model holds, then approximately • 68% of the observations will fall within of the regression line • 95% of the observations will fall within of the regression line

  12. Estimating • Residuals, , are an estimate of deviation of from its estimated mean given • Residuals provide basis for an estimate of • Degrees of freedom for for simple linear regression = n-2

  13. JMP commands • is found under Summary of Fit and is labeled “Root Mean Square Error” • To look at a plot of residuals versus X, click Plot Residuals under the red triangle next to Linear Fit after fitting the line. • To save the residuals or fitted values (predicted values), click Save Residuals or Save Predicteds under the red triangle next to Linear Fit after fitting the line.

  14. Accuracy of predictions • If the simple linear regression models holds, then approximately • 68% of the observations will fall within of the regression line • 95% of the observations will fall within 2 of the regression line • For catheter data, . Approximately 68% of the time the predicted catheter length given height will be at most 4.01 cm wrong; approximately 95% of the time the predicted catheter length given height will be at most 2*4.01=8.02 cm wrong.

  15. Interpolation and Extrapolation • The simple linear regression model makes it possible to draw inference about any mean response, • Interpolation: Drawing inference about mean response for X within range of observed X; strong advantage of regression model is ability to interpolate (e.g., predict mean catheter length required for child who is 42.0 inches, height not observed in sample). • Extrapolation: Drawing inference about mean response for X outside of range of observed X; dangerous. Straight-line model may hold approximately over region of observed X but not for all X. • Extrapolation in catheter data:

  16. Difficulties of extrapolation • Mark Twain: “In the space of one hundred and seventy-six years, the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the old Oolitic Silurian period, just a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale return of conjecture out of such a trifling investment of fact.”

More Related