1 / 119

Regression

Regression. Shibin Liu SAS Beijing R&D. Agenda. 0 . Lesson overview 1. Exploratory Data Analysis 2. Simple Linear Regression 3. Multiple Regression 4. Model Building and Interpretation 5. Summary. 2. Agenda. 0. Lesson overview 1. Exploratory Data Analysis

Download Presentation

Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Shibin Liu SAS Beijing R&D

  2. Agenda • 0. Lesson overview • 1. Exploratory Data Analysis • 2. Simple Linear Regression • 3. Multiple Regression • 4. Model Building and Interpretation • 5. Summary 2

  3. Agenda • 0. Lesson overview • 1. Exploratory Data Analysis • 2. Simple Linear Regression • 3. Multiple Regression • 4. Model Building and Interpretation • 5. Summary 3

  4. Lesson overview Response Variable Predictor Variable + ANOVA 4

  5. Lesson overview Continuous Continuous Correlation analysis Linear regression 5

  6. Lesson overview Continuous response Continuous predictor Correlation analysis • Measure linear association • Examine the relationship • Screen for outliers • Interpret the correlation 6

  7. Lesson overview Continuous response Continuous predictor Linear regression • Define the linear association • Determine the equation for the line • Explain or predict variability 7

  8. What do you want to examine? Lesson overview Descriptive Statistics Inferential Statistics The location, spread, and shape of the data’s distribution The relationship between variables The difference between groups on one or more variables How many groups? Summary statistics or graphics? Which kind of variables? Categorical response variable Summary statistics Both Continuous only Two Two or more ONE-WAY FREQUENCIES & TABLE ANALYSIS SUMMARY STATISTICS CORRELATIONS TTEST DISTRIBUTION ANALYSIS Frequency tables, chi-square test Descriptive Statistics Descriptive Statistics, histogram, normal, probability plots LINEAR MODELS LOGISTIC REGRESSION LINEAR REGRESSION Analysis of variance 8

  9. Agenda • 0. Lesson overview • 1. Exploratory Data Analysis • 2. Simple Linear Regression • 3. Multiple Regression • 4. Model Building and Interpretation • 5. Summary 9

  10. Exploratory Data Analysis: Introduction Height Weight Continuous variable Continuous variable Linear regression Scatter plot Correlation analysis Exploratory data analysis 10

  11. Exploratory Data Analysis: Objective • Examine the relationship between continuous variable using a scatter plot • Quantify the degree of association between two continuous variables using correlation statistics • Avoid potential misuses of the correlation coefficient • Obtain Pearson correlation coefficients 11

  12. Exploratory Data Analysis: Using Scatter Plots to Describe Relationships between Continuous Variables Scatter plot Correlation analysis Relationship Trend Range Outlier Communicate analysis result X: Predict variable Y: Response variable Coordinate: values of X and Y Exploratory data analysis 12

  13. Exploratory Data Analysis: Using Scatter Plots to Describe Relationships between Continuous Variables ? Model Terms2 Squared Quadratic 13

  14. Exploratory Data Analysis: Using Correlation to Measure Relationships between Continuous Variables Scatter plot Correlation analysis Correlation analysis Linear association Negative Zero Positive Exploratory data analysis 14

  15. Exploratory Data Analysis: Using Correlation to Measure Relationships between Continuous Variables Person correlation coefficient: For population For sample 15

  16. Exploratory Data Analysis: Using Correlation to Measure Relationships between Continuous Variables Person correlation coefficient: r -1 0 +1 Correlation analysis No linear relationship Strong negative linear relationship Strong positive linear relationship 16

  17. Exploratory Data Analysis: Hypothesis testing for a Correlation Correlation Coefficient Test H0: 0 Ha: 0 • A p-value does not measure the magnitude of the association. • Sample size affects the p-value. • Rejecting the null hypothesis only means that you can be confident that the true population correlation is not 0. small p-value can occur (as with many statistics) because of very large sample sizes. Even a correlation coefficient of 0.01 can be statistically significant with a large enough sample size. Therefor, it is important to also look at the value of r itself to see whether it is meaningfully large. 17

  18. Exploratory Data Analysis: Hypothesis testing for a Correlation -1 0 +1 r r r r 0.81 0.72 18

  19. Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect Correlation does not imply causation Besides causality, could other reasons account for strong correlation between two variables? 19

  20. Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect Correlation does not imply causation Weight Height A strong correlation between two variables does not mean change in one variable causes the other variable to change, or vice versa. 20

  21. Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect Correlation does not imply causation 21

  22. Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect Correlation does not imply causation 22

  23. Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect ? SAT score bounded to college entrance or not X: the percent of students who take the SAT exam in one of the states Y: SAT scores 23

  24. Exploratory Data Analysis: Avoiding Common Errors: Types of Relationships ? Pearson correlation coefficient: r -> 0 curvilinear parabolic quadratic 24

  25. Exploratory Data Analysis: Avoiding Common Errors: outliers Data one Data two r=0.02 r=0.82 25

  26. Exploratory Data Analysis: Avoiding Common Errors: outliers What to do with outlier? ? Why an outlier Valid Compute two correlation coefficients Error Collect data Report both coefficients Replicate data 26

  27. Exploratory Data Analysis: Scenario: Exploring Data Using Correlation and Scatter Plots Fitness oxygen consumption ? 27

  28. Exploratory Data Analysis: Exploring Data with Correlations and Scatter Plots 28

  29. Exploratory Data Analysis: Exploring Data with Correlations and Scatter Plots What’s the Pearson correlation coefficient of Oxygen_Consumptionwith Run_Time? What’s the p-value for the correlation of Oxygen_Consumptionwith Performance? 29

  30. Exploratory Data Analysis: Exploring Data with Correlations and Scatter Plots 30

  31. Exploratory Data Analysis: Examining Correlations between Predictor Variables 31

  32. Exploratory Data Analysis: Examining Correlations between Predictor Variables What are the two highest Pearson correlation coefficient s? 32

  33. Exploratory Data Analysis Question 1. The correlation between tuition and rate of graduation at U.S. college is 0.55. What does this mean? The way to increase graduation rates at your college is to raise tuition Increasing graduation rates is expensive, causing tuition to rise Students who are richer tend to graduate more often than poorer students None of the above. Answer: d 33

  34. Agenda • 0. Lesson overview • 1. Exploratory Data Analysis • 2. Simple Linear Regression • 3. Multiple Regression • 4. Model Building and Interpretation • 5. Summary 34

  35. Simple Linear Regression: Introduction 35

  36. Simple Linear Regression: Introduction -1 0 +1 Variable A Variable B Variable C Variable D Linear relationships 36

  37. Simple Linear Regression: Introduction r Same r Different 37

  38. Simple Linear Regression: Introduction Simple Linear Regression Y: variable of primary interest Regression Line X: explains variability in Y 38

  39. Simple Linear Regression: Objective • Explain the concepts of Simple Linear Regression • Fit a Simple Linear Regression using the Linear Regression task • Produce predicted values and confidence intervals. 39

  40. Simple Linear Regression: Scenario: Performing Simple Linear Regression Simple Linear Regression Fitness Run_Time Oxygen_Consumption Linear regression 40

  41. Simple Linear Regression: The Simple Linear Regression Model 41

  42. Simple Linear Regression: The Simple Linear Regression Model Question 2. • What does epsilon represent? • The intercept parameter • The predictor variable • The variation of X around the line • The variation of Y around the line Answer: d 42

  43. Simple Linear Regression: How SAS Performs Linear Regression Method of least square Minimize Best Linear Unbiased Estimators . Are unbiased estimators . Have minimum variance 43

  44. Simple Linear Regression: Measuring How Well a Model Fits the Data Regression model Baseline model VS. 44

  45. Simple Linear Regression: Comparing the Regression Model to a Baseline Model Base line model: Better model: Explain more variability 45

  46. Simple Linear Regression: Hypothesis Testing for Linear Regression Linear regression 46

  47. Simple Linear Regression: Assumptions of Simple Linear Regression Linearregression Assumptions: 1 .The mean of Y is linearly related to X. 2. Errors are normally distributed 3. Errors have equal variances. 4. Errors are independent. 47

  48. Simple Linear Regression: Performing Simple Linear Regression Task >Regression>Linear Regression 48

  49. Simple Linear Regression: Performing Simple Linear Regression Task >Regression>Linear Regression 49

  50. Simple Linear Regression: Performing Simple Linear Regression Question 3. In the model Y=X, if the parameter estimate (slope) of X is 0, then which of the following is the best guess (predicted value) for Y when X is equals to 13? 13 The mean of Y A random number The mean of X 0 Answer: b 50

More Related