1 / 32

The greatest blessing in life is in giving and not taking.

The greatest blessing in life is in giving and not taking. Statistical Package Usage. Topic: Simple Linear Regression By Prof Kelly Fan, Cal State Univ, East Bay. Overview. Correlation analysis Linear regression model Goodness of fit of the model Model assumption checking

keithp
Download Presentation

The greatest blessing in life is in giving and not taking.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The greatest blessing in life is in giving and not taking.

  2. Statistical Package Usage Topic: Simple Linear Regression By Prof Kelly Fan, Cal State Univ, East Bay

  3. Overview • Correlation analysis • Linear regression model • Goodness of fit of the model • Model assumption checking • How to handle outliers

  4. Example: Weight vs. Height

  5. Correlation between X and Y • X and Y might be related to each other in many ways: linear or curved. • Correlation analysis provided here is only for the linear association.

  6. (Pearson) Correlation Coefficient of X and Y • A measurement of the strength of the “LINEAR” association between X and Y • Sx: the standard deviation of the data values in X, Sy: the standard deviation of the data values in Y; the correlation coefficient of X and Y is:

  7. Correlation Coefficient of X and Y • -1< r < 1 • The magnitude of r measures the strength of the linear association of X and Y • The sign of r indicate the direction of the association: “-”  negative association “+”  positive association

  8. Examples of Different Levels of Correlation r=.71 Median Linearity r=.98 Strong Linearity

  9. Examples of Different Levels of Correlation r=.00 Nearly Curved r=-.09 Nearly Uncorrelated

  10. Example: Weight vs. Height

  11. Graphical Summary of Two Quantitative Variable Scatterplot of response variable against explanatory variable • What is the overall (average) pattern? • What is the direction of the pattern? • How much do data points vary from the overall (average) pattern? • Any potential outliers?

  12. Weight is somewhat Linearly related with Height Weight is Increasingas Height increases. Data points are more or less around the line. No potential outlier. Summary for Height and Weight Some Simple Conclusions Scatterplot (Weight vs. Height)

  13. Regression Equation • The regression line models the relationship between X and Y on average. • The math equation of a regression line is called regression equation.

  14. The Usage of Regression Equation • Predict the value of Y for a given X value Eg. What is the weight for a student with 60” high?

  15. Predicted Value (Fitted Value) • is called “predicted Y,” pronounced as “y hat,” which estimates the average Y value for a specified X value. Eg. • The predicted weight for a given height

  16. The Limitation of the Regression Equation • The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed. Eg. The predicted WT of a given HT: Given HT of 50”, the regression equation will give us WT of -587.1+11.1x50 = -32.1 pounds!!

  17. The Unpredicted Part • The value is the part the regression equation (model) cannot predict, and it is called “residual.”

  18. } residual

  19. Goodness of Fit • R^2 is the proportion of Y variance explained/accounted by the model we use to fit the data • When there is only one X (simple linear regression) R^2 = r^2.

  20. SAS Output

  21. Confidence Interval of Mean Y

  22. Prediction Interval

  23. Model Checking • Scatter plot shows linear pattern • Normality: The residuals must follow a normal distribution • Equal Variances: The variance of residuals must be constant over the different values of X

  24. Checking the Normality

  25. Tests for Normality

  26. Checking the Equal Variances

  27. Identify Outliers using Residual Plots • Use “studentized” residuals!! • The cases with studentized residuals of size 3 or more are outliers

  28. Transformation • If data show non-normal or unequal variances, then transform Y. • If data show a non-linear pattern, then add X^2 into the model

  29. The Influence of Outliers Type I Outlier • The slope becomes bigger (toward outliers) • The r value becomes smaller (less linear)

  30. The Influence of Outliers Type II Outlier • The slope becomes clear (toward outliers) • The | r | value becomes larger (more linear: 0.1590.935)

  31. How to Handle Outliers • Use Spearman correlation coef (the r of rank of X and rank of Y) • For Type I outliers: • Try to find the special factor behind the outliers. • Report two models with and without outliers. • For Type II outliers: • Report the model without outliers and mention the existence of outliers. • Whenever possible, collect more data in the range of X where outliers appear.

  32. SAS Output

More Related