1 / 39

Lab 14

Lab 14. Curvilinear analysis and detailed example of categorical and continuous variables analysis. Curvilinear Regression. Linear regression assumes that a straight line properly represents the relations between each IV and the DV.

Download Presentation

Lab 14

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 14 Curvilinear analysis and detailed example of categorical and continuous variables analysis

  2. Curvilinear Regression • Linear regression assumes that a straight line properly represents the relations between each IV and the DV. • This is not always the case. For example, it has been found that the relationship between job satisfaction and job tenure (length of time in a job) is a curvilinear relationship. Employees with low and high tenure have high satisfaction and employees with moderate tenure have the lowest satisfaction.

  3. Example of Curvilinear Job Satisfaction and tenure

  4. How to test this with SAS • What we do in polynomial regression is to conduct a sequence of tests. We start with regressing DV on IV. • Then add IV*IV to model to see if that accounts for a significant amount of additional variance. • If it does, we add IV*IV*IV to see if it adds variance. We stop when adding a successive power term fails to add variance accounted for.

  5. Example • A sports physiologist is interested in the effects of diet on strength of athletes. He measures strength and the amount of protein consumed and he wants to know what the relationship is between these two variables. • Form quadratic and cubic terms. • Run the regressions to test for trends and identify the best model. • Graph the relations between X and Y for evidence of nonlinearity.

  6. Example program data d1; input protein strength; *create power terms; protein2=protein*protein; protein3=protein2*protein; cards; …; *regressions with linear, quadratic, and cubic models; *linear; procreg; model strength = protein; plot strength*protein r.*p.; *quadratic; procreg; model strength = protein protein2; plot r.*p.; *cubic; procreg; model strength = protein protein2 protein3; plot r.*p.; run;

  7. Output Model 1 Model: MODEL1 Dependent Variable: strength Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 16191 16191 646.01 <.0001 Error 248 6215.86885 25.06399 Corrected Total 249 22407 Root MSE 5.00639 R-Square 0.7226 Dependent Mean 202.56800 Adj R-Sq 0.7215 Coeff Var 2.47146 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 145.33012 2.27414 63.91 <.0001 protein 1 0.81480 0.03206 25.42 <.0001

  8. Model 2 Output Model: MODEL1 Dependent Variable: strength Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 19145 9572.45217 724.73 <.0001 Error 247 3262.43966 13.20826 Corrected Total 249 22407 Root MSE 3.63432 R-Square 0.8544 Dependent Mean 202.56800 Adj R-Sq 0.8532 Coeff Var 1.79412 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 22.06447 8.40699 2.62 0.0092 protein 1 4.42387 0.24247 18.24 <.0001 protein2 1 -0.02589 0.00173 -14.95 <.0001

  9. Model 3 Output Sum of Mean Source DF Squares Square F Value Pr > F Model 3 19145 6381.64432 481.20 <.0001 Error 246 3262.41104 13.26183 Corrected Total 249 22407 Root MSE 3.64168 R-Square 0.8544 Dependent Mean 202.56800 Adj R-Sq 0.8526 Coeff Var 1.79776 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 20.15763 41.90111 0.48 0.6309 protein 1 4.51006 1.87112 2.41 0.0167 protein2 1 -0.02716 0.02742 -0.99 0.3230 protein3 1 0.00000613 0.00013194 0.05 0.9630

  10. Conclusions • The b-weight is significant for the quadratic model and not for the cubic model, therefore it appears that the quadratic equation is the best fit for this data (Y’=22.06+4.42X1-.026X12) and it accounts for 85% of the variance. • Looking back at the graph (strength*protein), it appears that the benefit of protein is large at first and then levels off, where athletes receive little to no benefit at around the 70 mark.

  11. Detailed Example Events variable is a person's score on a life event scale, indicating the number and severity of recent life events.  Status variable is a measure of whether a person co-habits with a partner (a 0 indicates that they do not, and a 1 indicates that they do).  Stress variable is the score on self-report measure of experienced stress

  12. Hypotheses • 1: The more life events, the greater the stress. • 2: Those who live with their partner will have lower stress than participants who don’t live with a partner. • 3: The relationship between events and stress is predicted to be moderated by status. Participants who cohabitate with a partner are predicted to be less stressed by life events than those who do not live with a partner.

  13. Evaluate Normality • Check normality in variables. • Proc univariate normal plot; • Check normality by Status. • Proc univariate normal plot; • By status;

  14. Results of normality • Box plots: Stress variable looks normal but Events is positively skewed with few people having high scores. No evident outliers. • Shapiro-Wilk supports visual conclusions, Stress was not significant (W = 0.981, ns) and Events was significant (W = 0.935, p < .05) , indicating non normality. With a small percentage of participants reporting large number of life events. • Good distribution of status, 30 in a relationship and 30 not in a relationship.

  15. Normality with by Status • Participants not in a relationship had higher means on events in life than those in a relationship. Similar variability in the both status groups across the event variable. • Participants not in a relationship had higher means on stress variable than those in a relationship, providing visual support for hypothesis 1. There were two outliers in the relationship group and the variance appears smaller in the relationship group.

  16. Descriptive stats • Means, SD, and correlations. • Proc means; • Proc corr;

  17. Proc mean and corr results • Both independent variables, Status and Events, had significant relationships with stress. • Status had a significant negative relationship with stress (r(58) = -.49, p <.05; 0=doesn’t cohabit and 1=does cohabit). • Events had a significant positive relationship with stress (r(58) = .41, p < .05). • Independent variables not significantly correlated with one another (status and events; r(58) = -.12, ns), which indicates that collinearity is not a problem with these data.

  18. Linearity, Outliers, and Homoscedasticity • Look at plots for heteroscedasiticity and nonlinearity. • procgplot; • plot stress*event; • procgplot; • plot stress*event; • by status;

  19. Graphs • No evidence of heteroscedasticity or non linear trends. • There does appear to be a stronger relationship between stress and events for those participants who do not live with a partner.

  20. Statistical test for Curvilinear data • Create power terms: • Event2=event*event; • Standardize variables: • Proc standard m=0; • Run regression on linear and quadratic models: • procreg; • model stress = event; • procreg; • model stress = event event2;

  21. Results of curvilinear analysis • Linear model is significant and accounts for 17% of the variance in stress (F(1, 58) = 11.85, p < .05). • Quadratic model is also significant (F(2,57) = 6.80, p < .05) and accounts for 19% of the variance, but the beta-weight for the quadratic term is not significant (b(57) = 1.27, ns). • Therefore, the linear model appears to be the best fit for this data.

  22. Data fit, outliers and homoscedasticity • Run regression and check for outliers. • Proc reg; • Model stress = event status/ stb R influence; • Plot p.*r. stress*p.;

  23. Results outliers • Predicted by residuals plot showed no apparent heteroscedasticity. The values appeared to be randomly scattered around the zero residual line. • Predicted by actual demonstrates a positive relationship. No apparent outliers

  24. Results outliers (cont.) • 3 outliers were identified with a studentized residual greater than 2, #10, #29, and #54. • Leverage > 2(k+1)/N = .10. • Cook’s D >.2 • DF Betas > .26

  25. Outlier conclusions • There doesn’t appear to be any large problems with outliers. #29 did have some influence so we will try running the regression analysis without it at the end and see if there are differences in the significance.

  26. Collinearity • Analyze regression with collinearity diagnostics included. • Procreg; • Model stress = event status/ vif tol collin;

  27. Collinearity results

  28. Analyze Regression Results • Create interaction term: • inter = status*event; • Run regression analysis with and without interaction. • Proc Reg; • Model stress = status event inter/stb; • Go to flow chart on the next slide.

  29. Is the model significant? NO Finished  YES Is the b for the interaction term significant (b3)? The slopes of the two groups differ: Compute a separate regression for each group Yes No Y’ = a + b1X1(groupvar) + b2X2(continvar) + b3X1*X2(inter)

  30. NO The continuous variable does not have an effect on Y. Is b1 significant? The slopes of the two groups are the same. Drop the interaction term and rerun the regression. Is b2 significant? NO YES NO YES Only if IVs highly correlated There is a difference between the means of your two groups. The continuous variable does exert an influence on Y. Is b1 significant? No differences in the intercept. Compute one regression with all the data pooled (drop the G variable). NO YES The groups have different intercepts. Compute the intercepts by adding or subtracting b1 from a.

  31. Regression results • Overall model without the interaction was significant (F(2,57) = 16.51, p < .05) and accounted for 37% of the variance. • Both life events (β = .36, t(58) = 3.35, p<.05)and status (β= -.45, t(58) = -4.21, p<.05)were significant predictors of stress. • The overall model with the interaction was also significant (F(3,56) = 12.98, p < .05) and accounted for 41% of the variance. • The interaction was significant (β = -.40, t(58) = -2.00, p<.05), but status was no longer significant (β = -.13, t(58) = -.67, ns). • Therefore, The slopes of the two groups differ: Compute separate regressions for each group

  32. Produce regression on the same graph, correlation by status, proc means • Proc Means; • By status; • Run correlation by group: • Proccorr; • Var stress event; • By status; • Overlay regressions for two groups: • symbol1 color=blue interpol=r1 value=none; • symbol2 color=black interpol=r2 value=none; • ProcSort; by status; • Procgplot; • plot stress*event = status;

  33. Conclusions • For participants who did not live with a partner, the correlation between stress and life events was not significant (r(28) = .10, ns). • For participants who did live with a partner, the correlation between stress and life events was significant (r(28) = .62, p < .05). • The graph of the two regression lines illustrate the interaction effect, with almost no slope for those not living with a partner and a moderate slope for those living with a partner. • Those participants living with a partner did show lower levels of stress (M = 18.3, SD = 5.47) than participants who do not live with a partner (M = 24.3, SD = 6.14), but this difference was not significant when the interaction was added to the model.

  34. Oops, one last thing, we forgot to run the model again deleting participant #29 • Delete participant #29 and rerun the analysis. • If _n_ = 29 then delete;

  35. Conclusions after deleting • After deleting that one case, the interaction term is no longer significant (β = -.32, t(57) = -1.62, ns). You would want to look at that one value and see if it was an error. • If you feel that the data point is a true score you should probably report results before and after. • A big limitation of this example is the low sample size. • If the sample size was larger, the interaction would probably be significant. There seemed to be a large effect. Even after the outlier was deleted, the correlations for the two groups were .62 and .19. • Might try testing for difference in significance between the two correlations, even though this test generally has less power.

More Related