
Building Risk Adjustment Models Andy Auerbach MD MPH
Overview • Reminder of what we are talking about • Building your own • Model discrimination • Model calibration • Model validation
Risk Adjustment Models • Typically explain only 20-25% of variation in health care utilization • Explaining this amount of variation can be important if remaining variation is extremely random • Example: supports equitable allocation of capitation payments from health plans to providers
Where does ‘risk adjustment’ fit in our model • Donabedian A. JAMA 1988;260:1743-8 Structure Process Outcomes Community Characteristics Health Status Health Care Providers - Technical processes - Care processes - Interpersonal processes Functional Status Delivery System Characteristics Satisfaction Public & Patients - Access - Equity - Adherence Provider Characteristics Mortality Cost Population Characteristics
Patient severity of illness is a kind of confounder ? Exposure Outcome Confounding Factor
What risk factors are…. • They are…. • Factors that affect the patient’s risk for outcome independent of the treatment • These factors can also be associated with: • Risks for receiving the treatment (allocation bias propensity scores) • Modification of the effect of treatment (interaction terms)
Risk factors are not… • While there may be some in common, risk factors for an outcome given a health condition are not the same as the risk factors for the condition. • Hyperlipidemia is a risk factor for MI but not for survival following an MI
Because in the end your analyses will look like this…. Measure = Predictor + confounders + error term Measure = Predictor + risk of outcome + other confounders + error term
Building Your Own Risk-Adjustment Model • What population • Generic or disease specific • What time period • Single visit/hospitalization or disease state that includes multiple observations • What outcomes • Must be clearly defined and frequent enough for modeling • What purpose • Implications for how good the model needs to be
Inclusion/Exclusion: Hospital Survival for Pneumonia • Include • Primary ICD-9 code 480-487 (viral/bacterial pneumonias) • Secondary ICD-9 code 480-487 and primary of empyema (510), pleurisy (511), pneumothorax (512), lung abscess (513), or respiratory failure (518) • Exclude • Age <18 years old • Admission in prior 10 days • Other diagnoses of acute trauma • HIV, cystic fibrosis, tuberculosis, post operative pneumonia
Episode of Care • Does dataset include multiple observations (visits) over time of the same individual? • Re-hospitalizations • Hospital transfers • Can dataset support linking observations (visits) over time? • Inclusion and exclusion criteria should describe handling of multiple observations
Identifying Risk Factors for Model • Previous literature • Expert opinion/consensus • Data dredging (retrospective)
Reliability of Measurement • Is the ascertainment and recording of the variable standardized within and across sites? • Are there audits of the data quality and attempts to correct errors?
Missing Data • Amount • Why is it missing? Biased ascertainment? • Does missing indicate normal or some other value? • Can missing data be minimized by inclusion/exclusion criteria? • May want to impute missing values
Risk Factors: Which Value With Multiple Measurements? • First? Worst? Last? • Consider whether timing of data collection of risk factor accurately reflects relevant health state, could confound rating of quality or number of missing values • May be able to improve estimate of some risk factors using multiple measures
Co-Morbidity or Complication • May be difficult to determine whether a condition is a co-morbidity or a complication • Shock following an MI • Including complications in risk adjustment models gives providers credit for poor quality care • True co-morbidities may be dropped from risk adjustment models out of concern that they sometimes represent complications
Caveats to risk factors: Gaming • Situation in which the coding of risk factors is influenced by coder’s knowledge or assumptions regarding how the data will be used to create a performance report or to calculate payment • The potential for gaming to alter the results (eg quality comparisons of providers) is related to the degree that it occurs similarly or differently across providers
Caveats: Co-morbidities andComplications • In administrative data, preexisting and acute illnesses have been coded without differentiation (e.g. acute vs. preexisting thromboembolism). • Generally not an issue for chronic diseases • Link to earlier records (eg previous admissions) can be helpful • Condition present at admission (CPAA) coding now a standard part of California hospital discharge data
Risk Factors: Patient Characteristics Not Process of Care • Processes of care can be indicative of severity • However treatments also reflect practice style/quality • Process measures can be explored as a possible mediators as opposed to risk factors for outcomes
Coronary Artery Disease: Mortality Rates by Race Age, coronary anatomy, ejection fraction, chf, angina, AMI, mitral regurgitation, periph vasc disease, coexisting illnesses: Peterson et al, NEJM, 1997
Building Multivariate Models • Start with conceptual framework from literature and expert opinion • Pre-specify statistical significance for retaining variables
Empirical Testing of Risk Factors • Univariate analyses to perform range checks, eliminate invalid values and low frequency factors • Bivariate analyses to identify insignificant or counterintuitive factors • Test variables for linear, exponential, u-shaped, or threshold effects
Building Multivariate Models • Stepwise addition (or subtraction) monitoring for: • 20% or more change in predictor parameter estimate • Statistical significance of individual predictors • Test for connections between risk and outcome/predictors • Add interactions between predictor and risk factors (or between risk factors) • Stratified analyses
CAGB Registry in NY State:Significant Risk Factors for Hospital Mortality for Coronary Artery Bypass Graft Surgery 1989-1992
Significant Risk Factors for Hospital Mortality for Coronary Artery Bypass Graft Surgery in New York State, 1989-1992
Risk Factors in Large Data Sets: Can you have too much power? • Large datasets prone to finding statistical significance • May want to consider whether statistical significance is clinically significant • May also want to select risk factors based on a clinically relevant prevalence… • Conversely, consider forcing in clinically important predictors even if not statistically significant
Counterintuitive findings in risk adjustment • Outcomes of MI treatment • Hypertension is protective - decreased risk of mortality • Perhaps a surrogate for patients on beta blockers • If don’t believe hypertension truly protective then best to drop from model
Smaller Models are Preferred • Rule of thumb: 10-30 observations per covariate not generally an issue in large datasets • Smaller models are more comprehensible • Less risk of “overfitting” the data
Evaluating Model’s Predictive Power • Linear regression (continuous outcomes) • Logistic regression (dichotomous outcomes)
Evaluating Linear Regression Models • R2 is percentage of variation in outcomes explained by the model - best for continuous dependent variables • Length of stay • Health care costs • Ranges from 0-100% • Generally more is better
More to Modeling than Numbers • R2 biased upward by more predictors • Approach to categorizing outliers can affect R2 as predicting less skewed data gives higher R2 • Model subject to random tendencies of particular dataset
Evaluating Logistic Models • Discrimination - accuracy of predicting outcomes among all individuals depending on their characteristics • Calibration - how well prediction works across the range of risk
Discrimination • C index - compares all random pairs of individuals in each outcome group (alive vs dead) to see if risk adjustment model predicts a higher likelihood of death for those who died (concordant) • Ranges from 0-1 based on proportion of concordant pairs and half of ties
Adequacy of Risk Adjustment Models • C index of 0.5 no better than random,1.0 indicates perfect prediction • Typical risk adjustment models 0.7-0.8 • 0.5 SDs better than chance results in c statistic =0.64 • 1.0 SDs better than chance resutls in c statistic = 0.76 • 1.5 SDs better than chance results in c statistic =0.86 • 2.0 SDs better tha chance results in c statistic =0.92
Best Model Doesn’t Always Have Biggest C statistic • Adding health conditions that result from complications will raise c statistic of model but not make the model better for predicting quality.
Spurious Assessment of Model Performance • Missing values can lead to some patients being dropped from models • Be certain when comparing models that the same group of patients is being used for all models otherwise comparisons may reflect more than model performance
Calibration - Hosmer-Lemeshow • Size of C index does not indicate how well model performs across range of risk • Stratify individuals into groups (e.g. 10 groups) of equal size according to predicted likelihood of adverse outcome (eg death) • Compare actual vs expected outcomes for each stratum • Want a non significant p value for each stratum and across strata (Hosmer-Lemeshow statistic)
Stratifying by Risk • Hosmer Lemeshow provides a summary statistic of how well model is calibrated • Also useful to look at how well model performs at extremes (high risk and low risk)
Hosmer-Lemeshow • For k strata the chi squared has k-2 degrees of freedom • Can obtain false negative (non significant p value) by having too few cases in a stratum
Individual’s CABG Mortality Risk • 65 y.o obese non white woman with diabetes and serum creatinine of 1 mg/dl presents with an urgent need for CABG surgery. What is her risk of death?
Calculating Expected Outcomes • Solve the multivariate model incorporating an individual’s specific characteristics • For continuous outcomes the predicted values are the expected values • For dichotomous outcomes the sum of the derived predictor variables produces a “logit” which can be algebraically converted to a probability • (e nat log odds/1 + e nat log odds)
Individual’s Predicted CABG Mortality Risk • 65 y.o obese non white woman with diabetes presents with an urgent need for CABG surgery. What is her risk of death? • Log odds = -9.74 +65(0.06) + 1(.37)+1(.16)+1(.42)+1(.26)+1(1.15) +1(.09) = 3.39 • Probability of death = elnodds/1+elnodds 0.034/1.034=3.3%
Observed CABG Mortality Risk • Actual outcome of whether individual lived or died • Observed rate for a group is number of deaths per the number of people in that group
Actual and Expected CABG Surgery Mortality Rates by Patient Severity of Illness in New York Chi squared p=.16
Validating Model • Eyeball test • Face validity/Content validity • Does empirically derived model correspond to a pre-determined conceptual model? • If not is that because of highly correlated predictors? A dataset limitation? A modeling error? • Internal validation in split sample • Test in a different data set
Internal Validation • Take advantage of the large size of administrative datasets • Establish development and validation data sets - Randomly split samples • Samples from different time periods/areas • Determine stability of model’s predicting power • Re-estimate model using all available data
Overfitting Data: Overspecified Model • Model performs much better in fitted data set than validation data set • May be due to • Infrequent predictors • Unreliable predictors • Including variables that do not meet pre-specified statistical significance