1 / 34

Additional Regression techniques

Additional Regression techniques. Scott Harris October 2009. Learning outcomes. By the end of this session you should: be aware of 2 additional regression techniques: Cox Regression Logistic regression; know when these techniques are applicable;

mahaney
Download Presentation

Additional Regression techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Additional Regression techniques Scott HarrisOctober 2009

  2. Learning outcomes By the end of this session you should: • be aware of 2 additional regression techniques: • Cox Regression • Logistic regression; • know when these techniques are applicable; • be able to interpret the results from these regression techniques.

  3. Contents • Cox Regression • Assumptions behind the model • Fitting Cox regression models in SPSS • Interpreting the model • Testing the assumptions • Log-log plot • Plots of partial residuals against rank time • Logistic Regression • When to use it • ‘How to’ in SPSS • Interpreting the output

  4. Cox regression

  5. Cox regression • Models time-to-event data in the presence of censored cases. • Allows the inclusion of predictor variables (covariates). These can be categorical or continuous. • Can be extended to allow for time dependent covariates (not covered here). • Also known as Cox Proportional Hazards model or Cox model.

  6. Hazard functions Hazard

  7. Hazard rates & ratios • The hazard rate is the probability that if the event in question has not already occurred, it will occur in the next time interval, divided by the length of that interval. This time interval is made very short, so that in effect the hazard rate represents an instantaneous rate. • The hazard ratio is an estimate of the ratio of the hazard rate in the treated versus the control group.

  8. Cox regression: PH assumption • Assumption of Proportional Hazards: The hazards are consistent and do not vary differently over time. • Can be graphically assessed by looking at the Log-Log plot: If PH model is true then the curves should be approximately parallel. • Can also examine the residuals (Schoenfeld residuals): If PH is true then the plot of the residuals should be horizontal and close to 0.

  9. SPSS – Cox regression Analyze  Survival  Cox Regression…

  10. SPSS – Cox regression

  11. SPSS – Cox regression * Cox regression adjusted for age . COXREG Time /STATUS=Status(1) /CONTRAST (Group)=Indicator(1) /METHOD=ENTER Age Group /SAVE=PRESID /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) .

  12. Info: Cox regression in SPSS • From the menus select ‘Analyze’  ‘Survival’  ‘Cox Regression…’. • Put the variable containing the time into the ‘Time:’ box. • Put the categorical variable, that indicates whether a case had the event of interest or not into the ‘Status:’ box. Then click the ‘Define Event…’ button and enter the single value or range of values that all indicate that the event occurred. Click ‘Continue’. • Add any other variables that you would like included in your model into the ‘Covariates:’ box. • If any of the variables that were included in the ‘Covariates:’ box are categorical then click the ‘Categorical…’ button. Each of these variables then need to be moved to the ‘Categorical Covariates:’ box. In the ‘Change Contrast’ box decide, for each variable, whether the reference category should be either the first or last level and make any changes if appropriate. Click ‘Continue’. • Click the ‘Save…’ button and tick the ‘Partial Residuals’ option in the ‘Diagnostics’ box. Click ‘Continue’. • Click the ‘Options’ button and tick the ‘CI for exp(β):’ option in the ‘Model Statistics’ box. Click ‘Continue’. • Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

  13. This table Hazard ratio for each unit increase in Age with CI and p value. Hazard ratio for being in Group B, relative to Group A (reference) with CI and p value. SPSS – Cox regression: Output in conjunction with how the contrast was set up defines how you should interpret the output for the categorical variables. Here the reference category was set up as the first level, which here sets Group A as the reference.

  14. SPSS – Cox regression Here you can see that the hazard is 78% higher for each additional year of age and this effect is highly significant (p=0.003). Having adjusted for age however there appears to be a very clear difference between the groups with a hazard ratio for Group B relative to Group A of 8.80 (95% CI: 1.34 to 57.94) (p=0.024). Notice that this confidence interval is very wide and that the lower limit suggests that the true hazard ratio may be as low as 1.34.

  15. SPSS – Cox regression If we take Age out of the model then the effect of the groups is reduced with Group B having an increased hazard ratio relative to Group A of 2.56 (95% CI: 0.74 to 8.82), which is now not statistically significant at the 5% level (p=0.136). Model selection for Survival models is as important as it is for other modelling procedures and needs to be thought about carefully.

  16. The PH assumption: Log-log plot The log-log plot is one way to assess graphically whether the assumption of proportional hazards was reasonable. For the assumption to hold then the log-log plot should show the separate lines as approximately parallel to each other.

  17. SPSS – The PH assumption: Log-log plot To produce an accurate log-log plot in SPSS you need to define the categorical variable as a Strata. * Log-log plot . COXREG Time /STATUS=Status(1) /STRATA=Group /METHOD=ENTER Age /PLOT LML /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) .

  18. Info: Cox regression: Log-log plot in SPSS • Follow the information sheet on producing a Cox regression, but stop after point 5. • To produce the Log-log plot we need to remove the most important categorical variable from the ‘Covariates:’ box and put it into the ‘Strata:’ box instead. This variable is quite often the groups that we are looking to compare. • Once a variable is in the ‘Strata:’ box, click on the ‘Plots…’ button. Tick the option for the ‘Log minus log’ plot in the ‘Plot Type’ box. Click ‘Continue’. • Finally click ‘OK’ to produce the plot or ‘Paste’ to add the syntax for this into your syntax file.

  19. SPSS – The PH assumption: Log-log plot Not enough cases in each strata  Dataset too small

  20. SPSS – Cox regression: Aside Aside: Strata Fitting the group variable as a strata instead of as a covariate, with no other covariates in the model, replicates the Kaplan-Meier plot if we ask for the survival plot.

  21. SPSS – The PH assumption: Residual plots Plot each of the residuals against rank time. If the PH assumption has not been violated then each of the plots: • Should not show a clear trend over time (i.e. not drastically increasing or decreasing). • It should also be centered close to 0. * Producing the scatter graphs . GRAPH /SCATTERPLOT(BIVAR)=RTime WITH PR1_1 /MISSING=LISTWISE . GRAPH /SCATTERPLOT(BIVAR)=RTime WITH PR2_1 /MISSING=LISTWISE . * Creating the ranks . RANK VARIABLES=Time (A) /RANK /PRINT=YES /TIES=MEAN .

  22. Info: Cox regression: Residual plots in SPSS • Follow the information sheet on producing a Cox regression all the way through until the end. This will save a new set of variables to the dataset that contain the residuals (you will get 1 residual for each covariate in the model and they will start with PR). • We now need to produce a rank time variable. To do this we need to go to ‘Transform’  ‘Rank Cases’. • Now put the time variable into the ‘Variable(s):’ box. • Click ‘OK’ to produce the ranks or ‘Paste’ to add the syntax for this into your syntax file. • Now we have the 2 elements to produce the scatter plots. To draw the scatterplots we go to: ‘Graphs’  ‘Scatter/Dot…’ then select ‘Simple Scatter’ and click ‘Define’. Put the new rank time on the x axis and each of the residual variables in turn on the y axis. • Finally click ‘OK’ to produce the plot or ‘Paste’ to add the syntax for this into your syntax file. • You can now edit the plot to improve presentation (see Introduction course notes). It is often useful to add a horizontal reference line at 0 to aid interpretation.

  23. SPSS – The PH assumption: Residual plots These plots don’t seem to indicate any obvious trend and are generally centered close to zero, but we are dealing with a very small example dataset here.

  24. Logistic regression

  25. Logistic regression • Logistic regression is used when the outcome variable is binary (is categorical and has 2 levels). • Allows the inclusion of predictor variables (covariates). These can be categorical or continuous. • The modeling is conducted on the log odds scale but the results should be presented on the odds scale (see categorical notes). • Can be extended to deal with outcomes with more than 2 levels. These models are known as multinomial or ordinal regression (not covered here).

  26. SPSS – Logistic regression Binary outcome variable All other covariates Analyze  Regression  Binary Logistic…

  27. SPSS – Logistic regression… If you have any categorical variables then you need to use the ‘Categorical…’ option to set up how to deal with these. ln_yesno is a binary yes/no variable so we move it into the ‘Categorical Covariates:’ box.

  28. SPSS – Logistic regression… Right click and select ‘Variable information’ For each categorical variable you now need to set up up which level will be the reference category. Here ‘No’ is the first category (the lowest code) and so we set this as the reference.

  29. SPSS – Logistic regression… Go into the options and tick the box for confidence intervals for the odds ratios. Go into the options and tick the box for confidence intervals for the odds ratios. Go into the options and tick the box for confidence intervals for the odds ratios.

  30. Info: Logistic Regression in SPSS • From the menus select ‘Analyze’  ‘Regression’  ‘Binary Logistic…’. • Put the variable containing the binary outcome into the ‘Dependent:’ box. • Add all other variables that you would like included in your model into the ‘Covariates:’ box. • If any of the variables that were included in the ‘Covariates:’ box are categorical then click the ‘Categorical…’ button. Each of these variables then need to be moved to the ‘Categorical Covariates:’ box. In the ‘Change Contrast’ box decide, for each variable, whether the reference category should be either the first or last level and make any changes if appropriate. Click ‘Continue’. • Click the ‘Options’ button and tick the ‘CI for exp(β):’ option in the ‘Statistics and Plots’ box. Click ‘Continue’. • Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

  31. SPSS Logistic Regression: Output Information on the amount of data used in the analysis. Very important as this identifies the level of the binary outcome that is being modelled. Here the higher level is 1 which was used to indicate subjects who died within 5 years and so this is what our model will be looking at. Very important as this identifies the level of the binary outcome that is being modelled. Here the higher level is 1 which was used to indicate subjects who died within 5 years and so this is what our model will be looking at. Very important as this identifies the level of the binary outcome that is being modelled. Here the higher level is 1 which was used to indicate subjects who died within 5 years and so this is what our model will be looking at. Convergence information.

  32. SPSS Logistic Regression: Output… P values. 95% confidence intervals for the odds ratios. Odds ratios. Interpretation: Having adjusted for lymph node involvement each additional year of age increases the odds of mortality within 5 years by a factor of 0.99 (95% CI 0.97 to 1.01), although this was not statistically significant (p=0.375). Having adjusted for age, subjects with lymph node involvement have their odds of mortality in 5 years increased by a factor of 2.65 (95% CI 1.49 to 4.72) compared to those with no lynph node involvement. This effect was highly statistically significant (p=0.001).

  33. Summary You should now: • be aware of 2 additional regression techniques: • Cox Regression • Logistic regression; • know when these techniques are applicable; • be able to interpret the results from these regression techniques.

  34. References • Practical Statistics for medical research, D Altman: Chapter 13. • Medical Statistics, B Kirkwood, J Stern: Chapter 26. • An introduction to medical statistics, M Bland: Chapter 15.6. Survival analysis specific texts • Kleinbaum D. G., Klein M., Survival Analysis: A Self-Learning Text, Springer-Verlag Publishers, 2005. • Parmar M. K. B., Machin D., Survival analysis: a practical approach, Wiley, 1995.

More Related