1 / 49

Presentation and Data http:// www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP

Presentation and Data http:// www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop. Regression Analysis Using JMP. Mark Seiss , Dept. of Statistics. Presentation Outline. Simple Linear Regression Multiple Linear Regression

ricky
Download Presentation

Presentation and Data http:// www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presentation and Data • http://www.lisa.stat.vt.edu • Short Courses • Regression Analysis Using JMP • Download Data to Desktop

  2. Regression Analysis Using JMP Mark Seiss, Dept. of Statistics

  3. Presentation Outline Simple Linear Regression Multiple Linear Regression Regression with Binary and Count Response Variables

  4. Presentation Outline • Questions/Comments • Individual Goals/Interests

  5. Definition Correlation Model and Estimation Coefficient of Determination (R2) Assumptions Example Simple Linear Regression

  6. Simple Linear Regression (SLR) is used to study the relationship between a variable of interest and another variable. Both variables must be continuous Variable of interest known as Response or Dependent Variable Other variable known as Explanatory or Independent Variable Objectives Determine the significance of the explanatory variable in explaining the variability in the response (not necessarily causation). Predict values of the response variable for given values of the explanatory variable. Simple Linear Regression

  7. Scatterplots are used to graphically examine the relationship between two quantitative variables. Linear or Non-linear Positive or Negative Simple Linear Regression

  8. Simple Linear Regression No Relationship Non-Linear Relationship Positive Linear Relationship Negative Linear Relationship

  9. Correlation Measures the strength of the linear relationship between two quantitative variables. Pearson Correlation Coefficient Assumption of normality Calculation: Spearman’s Rho and Kendall’s Tau are used for non-normal quantitative variables. Simple Linear Regression

  10. Properties of Pearson Correlation Coefficient -1 ≤ r ≤ 1 Positive values of r: as one variable increases, the other increases Negative values of r: as one variable increases, the other decreases Values close to 0 indicate no linear relationship between the two variables Values close to +1 or -1 indicated strong linear relationships Important note: Correlation does not imply causation Simple Linear Regression

  11. Pearson Correlation Coefficient: General Guidelines 0 ≤ |r| < 0.2 : Very Weak linear relationship 0.2 ≤ |r| < 0.4 : Weak linear relationship 0.4 ≤ |r| < 0.6 : Moderate linear relationship 0.6 ≤ |r| < 0.8 : Strong linear relationship 0.8 ≤ |r| < 1.0 : Very Strong linear relationship Simple Linear Regression

  12. The Simple Linear Regression Model Basic Model: response = deterministic + stochastic Deterministic: model of the linear relationship between X and Y Stochastic: Variation, uncertainty, and miscellaneous factors Model yi= value of the response variable for the ith observation xi= value of the explanatory variable for the ith observation β0= y-intercept β1= slope εi= random error, iid Normal(0,σ2) Simple Linear Regression

  13. Least Square Estimation Predicted Values Residuals Simple Linear Regression

  14. Interpretation of Parameters β0: Value of Y when X=0 β1:Change in the value of Y with an increase of 1 unit of X (also known as the slope of the line) Hypothesis Testing β0- Test whether the true y-intercept is different from 0 Null Hypothesis: β0=0 Alternative Hypothesis: β0≠0 β1- Test whether the slope is different from 0 Null Hypothesis: β1=0 Alternative Hypothesis: β1≠0 Simple Linear Regression

  15. Analysis of Variance (ANOVA) for Simple Linear Regression Simple Linear Regression

  16. Simple Linear Regression

  17. Coefficient of Determination (R2) Percent variation in the response variable (Y) that is explained by the least squares regression line 0 ≤ R2 ≤ 1 Calculation: Simple Linear Regression

  18. Assumptions of Simple Linear Regression 1. Independence Residuals are independent of each other Related to the method in which the data were collected or time related data Tested by plotting time collected vs. residuals Parametric test: Durbin-Watson Test 2. Constant Variance Variance of the residuals is constant Tested by plotting predicted values vs. residuals Parametric test: Brown-Forsythe Test Simple Linear Regression

  19. Assumptions of Simple Linear Regression 3. Normality Residuals are normally distributed Tested by evaluating histograms and normal-quantileplots of residuals Parametric test: Shapiro Wilkes test Simple Linear Regression

  20. Constant Variance: Plot of Fitted Values vs. Residuals Simple Linear Regression Good Residual Plot: No Pattern Bad Residual Plot: Variability Increasing Predicted Values Predicted Values

  21. Normality: Histogram and Q-Q Plot of Residuals Simple Linear Regression Normal Assumption Appropriate Normal Assumption Not Appropriate

  22. Some Remedies Non-Constant Variance: Weight Least Squares Non-normality: Box-Cox Transformation Dependence: Auto-Regressive Models Simple Linear Regression

  23. Example Dataset: Chirps of Ground Crickets Pierce (1949) measure the frequency (the number of wing vibrations per second) of chirps made by a ground cricket, at various ground temperature. Filename: chirp.jmp Simple Linear Regression

  24. Questions/Comments about Simple Linear Regression Simple Linear Regression

  25. Definition Categorical Explanatory Variables Model and Estimation Adjusted Coefficient of Determination Assumptions Model Selection Example Multiple Linear Regression

  26. Explanatory Variables Two Types: Continuous and Categorical Continuous Predictor Variables Examples – Time, Grade Point Average, Test Score, etc. Coded with one parameter – β#x# Categorical Predictor Variables Examples – Sex, Political Affiliation, Marital Status, etc. Actual value assigned to Category not important Ex) Sex - Male/Female, M/F, 1/2, 0/1, etc. Coded Differently than continuous variables Multiple Linear Regression

  27. Categorical Explanatory Variables Consider a categorical explanatory variable with L categories One category selected as reference category Assignment of Reference Category is arbitrary Variable represented by L-1 dummy variables Model Identifiability Effect Coding (Used in JMP) xk = 1 if explanatory variable is equal to category k 0 otherwise xk = -1 for all k if explanatory variable equals category I Multiple Linear Regression

  28. Similar to simple linear regression, except now there is more than one explanatory variable, which may be quantitative and/or qualitative. Model yi= value of the response variable for the ith observation x#i= value of the explanatory variable # for the ith observation β0= y-intercept β#= parameter corresponding to explanatory variable # εi= random error, iid Normal(0,σ2) Multiple Linear Regression

  29. Least Square Estimation Predicted Values Residuals Multiple Linear Regression

  30. Interpretation of Parameters β0: Value of Y when X=0 Β#:Change in the value of Y with an increase of 1 unit of X# in the presence of the other explanatory variables Hypothesis Testing β0- Test whether the true y-intercept is different from 0 Null Hypothesis: β0=0 Alternative Hypothesis: β0≠0 Β#- Test of whether the value change in Y with an increase of 1 unit in X# is different from 0 in the presence of the other explanatory variables. Null Hypothesis: β#=0 Alternative Hypothesis: β#≠0 Multiple Linear Regression

  31. Adjusted Coefficient of Determination (R2) Percent variation in the response variable (Y) that is explained by the least squares regression line with explanatory variables x1, x2,…,xp Calculation of R2: The R2 value will increase as explanatory variables added to the model The adjusted R2 introduces a penalty for the number of explanatory variables. Multiple Linear Regression

  32. Other Model Evaluation Statistics Akaike Information Criterion (AIC or AICc) Schwartz Information Criterion (SIC) Bayesian Information Criterion (BIC) Mallows’ Cp Prediction Sum of Squares (PRESS) Multiple Linear Regression

  33. Model Selection 2 Goals: Complex enough to fit the data well Simple to interpret, does not overfit the data Study the effect of each explanatory variable on the response Y Continuous Variable – Graph Y versus X Categorical Variable - Boxplot of Y for categories of X Multiple Linear Regression

  34. Model Selection cont. Multicollinearity Correlations among explanatory variables resulting in an increase in variance Reduces the significance value of the variable Occurs when several explanatory variables are used in the model Multiple Linear Regression

  35. Algorithmic Model Selection Backward Selection: Start with all explanatory variables in the model and remove those that are insignificant Forward Selection: Start with no explanatory variables in the model and add best explanatory variables one at a time Stepwise Selection: Start with two forward selection steps then alternate backward and forward selection steps until no variables to add or remove Multiple Linear Regression

  36. Example Dataset: Discrimination in Salaries A researcher was interested in whether there was discrimination in the salaries of tenure track professors at a small college. The professor collected six variables from 52 professors. Filename: Salary.xls Reference: S. Weisberg (1985). Applied Linear Regression, Second Edition. New York: John Wiley and Sons. Page 194. Multiple Linear Regression

  37. Other Multiple Linear Regression Issues Outliers Interaction Terms Higher Order Terms Multiple Linear Regression

  38. Questions/Comments about Multiple Linear Regression Multiple Linear Regression

  39. Logistic Regression with Binary Response Poisson Regression with Count Response Regression with Non-Normal Response

  40. Consider a binary response variable. Variable with two outcomes One outcome represented by a 1 and the other represented by a 0 Examples: Does the person have a disease? Yes or No Who is the person voting for? McCain or Obama Outcome of a baseball game? Win or loss Logistic Regression

  41. Consider the linear probability model where yi = response for observation i xi = quantitative explanatory variable Predicted values represent the probability of Y=1 given X Issue: Predicted probability for some subjects fall outside of the [0,1] range. Logistic Regression

  42. Consider the logistic regression model Predicted values from the regression equation fall between 0 and 1 Logistic Regression

  43. Interpretation of Coefficient β – Odds Ratio The odds ratio is a statistic that measures the odds of an event compared to the odds of another event. Say the probability of Event 1 is π1and the probability of Event 2 is π2. Then the odds ratio of Event 1 to Event 2 is: Value of Odds Ratio range from 0 to Infinity Value between 0 and 1 indicate the odds of Event 2 are greater Value between 1 and infinity indicate odds of Event 1 are greater Value equal to 1 indicates events are equally likely Logistic Regression

  44. Example Dataset: A researcher is interested how GRE exam scores, GPA, and prestige of a students undergraduate institution affect admission into graduate school. Filename: Admittance.csv Important Note: JMP models the probability of the 0 category Logistic Regression

  45. Consider a count response variable. Response variable is the number of occurrences in a given time frame. Outcomes equal to 0, 1, 2, …. Examples: Number of penalties during a football game. Number of customers shop at a store on a given day. Number of car accidents at an intersection. Poisson Regression

  46. Consider the model where yi= response for observation i xi = quantitative explanatory variable for observation i Issue: Predicted values range from -∞ to +∞ Poisson Regression

  47. Consider the Poisson log-linear model Predicted response values fall between 0 and +∞ In the case of a single predictor, An increase of one unit of x results an increase of exp(β) in μ Poisson Regression

  48. Example Data Set: Researchers are interested in the number of awards earned by students at a high school. Other variables measured as possible explanatory variables include type of program in which the student was enrolled (vocational, general, or academic), and the final score on their math final exam. Filename: Awards.csv Poisson Regression

  49. Attendee Questions • If time permits

More Related