Kate Reinhalter Bazinsky Michael Bailit September 10, 2013

The Significant Lack of Alignment Across State and Regional Health Measure Sets:An Analysis of 48 State and Regional Measure Sets, Resource Document Kate Reinhalter Bazinsky Michael Bailit September 10, 2013

Executive summary • The are many state/regional performance measures for providersin use today. • 1367 measures identified across 48 measure sets. • Unfortunately, current state and regional measure sets are not aligned. • Only 20% of all measures were used by more than one program. • Non-alignment persists despite the tendency to use standard, NQF-endorsed and/or HEDIS measures. • Although 59% of the measures come from standard sources, they are selecting different subsets of these standard measures for use. • The most frequently used measure was only used by 63% of the programs.

Executive summary (cont’d) • With few exceptions, regardless of how we analyzed the data, the programs’ measures were not aligned. • This lack of alignment persists across programs of the same type and for the same purpose. • Medicaid MCOs are the exception and use far more of the same measures than any other type of program. This is partially because they rely almost exclusively on HEDIS measures. • We also found that California has more alignment. This may be due to our sample or the work the state has done to align measures. • While many programs use measures from the same domains, they are not selecting the same measures within these domains. • This suggests that simply specifying the domains from which programs should select measures will not facilitate measure set alignment.

Executive summary (cont’d) • Even when the measures are “the same,” the programs often modify the traditional specifications for the standard measures. • 83% of the measure sets contained at least one modified measure. • Two of the programs modified every single measure and six of the programs modified at least 50% of their measures. • Many programs create their own “homegrown” measures. • 40% of the programs created their own homegrown measures. • Some of these may be measure concepts, rather than measures that are ready to be implemented • Unfortunately most of these homegrown measures do not represent true innovation in the measures space. • There appears to be a need for new standardized measures in the areas of self-management, cost, and care management and coordination.

Conclusions • Bottom line: Measures sets appear to be developed independently without an eye towards alignment with other sets. • The diversity in measures allows states and regions interested in creating measure sets to select measures that they believe best meet their local needs. Even the few who seek to create alignment struggle due to a paucity of tools to facilitate such alignment. • The result is “measure chaos” for providers subject to multiple measure sets and related accountability expectations and performance incentives. Mixed signals make it difficult for providers to focus their quality improvement efforts.

Purpose • Goal: Paint a picture of the measures landscape across states and regions to inform development of the emerging Buying Value measure set. • Process: Identify and collect 48 measure sets used by 25 states for a range of purposes and conduct a multi-pronged analysis: • Provide basic summary information to describe the 48 measure sets • Provide an overview of the measures included in the 48 measure sets • Analyze the non-NQF endorsed measures • Analyze the measures by measure set type • Analyze the measures by measure set purpose • Analyze the measures by domain/ clinical areas • Assess the extent of alignment within the states of CA and MA

Methodology • We used a convenience sample of measure sets from states, by requesting assistance from our contacts in states and by: • Obtaining sets through state websites: • Patient-Centered Medical Home (PCMH) projects • Accountable Care Organization (ACO) projects • CMS’ Comprehensive Primary Care Initiative (CPCI) • Soliciting sets from the Buying Value measures work group • We also included measure sets from specific regional collaboratives. • We have not surveyed every state, nor have we captured all of the sets used by the studied states. • We did not include any hospital measures sets in our analysis. • Excluded 53 hospital measures from the analysis

Methodology (cont’d) • Organized the measures by: • Measure steward • NQF status/ number • Age of the population of interest • Program type (e.g., ACO, PCMH, health home) • Program purpose (e.g., payment or reporting) • Domain (used the NQS tagging taxonomy) • Clinical areas of interest (used NQF taxonomy detail) • Unduplicated the total measures list to identify the “distinct” measures • If a measure showed up in multiple measure sets, we only counted it once. • If a program used a measure multiple times (variations on a theme) we also only counted it once.

Methodology (cont’d) • Assessed whether the measure is standard, modified, homegrown or undetermined. • If we did not have access to the specifications, but the measure appeared to be standard through combination of steward and title or NQF#, we considered it to be a “standard”measure. This approach is likely to underestimate the number of modified measures. • We labeled measures “modified”if they were standard measures with a change to the traditional specifications. • We labeled measures “homegrown” if they were were indicated on the source document as having been created by the developer of the measure set. • We labeled measures “undetermined” if the source of the measure was unclear. Some of these measures may be “homegrown” while others may be drawn from niche sources.

Table of contents 1. Overview of measure sets 2. Overview of measures 3. Non-standard measures 4. Analysis by measure set type 5. Analysis by measure set purpose 6. Analysis by measure domain/clinical area 7. Intrastate analysis of CA and MA 8. Conclusions / recommendations

1. Overview of measure sets • Goal: provide some basic summary information to describe the group of measures sets and answer the following questions: • How many measures are included across the measure sets? • How many measures are included in the average measure set?

Measure sets by state • ME (2) • MI • MN (2) • MO (3) • MT • NY • OH • OK • OR • PA (4) • RI • TX • UT (2) • WA • WI • Reviewed 48 measure sets used by 25 states. • Intentionally gave a closer look at two states: CA and MA. • AR • CA (7) • CO • FL • IA (2) • ID • IL • LA • MA (8) • MD Note: If we reviewed more than one measure set from a state, the number of sets included in the analysis is noted above.

Program types • Note: these categories are meant to be mutually exclusive. Each measure set was only included in one category. • ACO:Measure sets used by states to evaluate Accountable Care Organizations. Organizations of providers that agree to be accountable for the clinical care and cost of a specific attributed population • Alignment Initiative:Measure sets created by statewide initiatives in an attempt to align the various measures being used throughout the state by various payers or entities • Commercial Plans: Measure sets used by states to evaluate insurers serving commercial members • Duals: Measure sets used by state Medicaid agencies in programs serving beneficiaries who are dually eligible for Medicare and Medicaid • Exchange: Measure sets used to assess plan performance in a state-operated marketplace for individuals buying health insurance coverage

Program types (cont’d) • Medicaid:Measure sets used by states to evaluate the Medicaid agency performance • Medicaid MCO:Measure sets used by state Medicaid agencies to assess performance of their contracted managed care organizations • Medicaid BH MCO:Measure sets used by state Medicaid agencies to assess performance of their contracted behavioral health managed care organizations • PCMH:Measure sets used by patient-centered medical home initiatives • Other Provider:Measure sets used by states to assess performance at the provider level, but are not for assessing ACO, PCMH or Health Home initiatives • Regional Collaboratives: A coalition of organizations coordinating measurement efforts at a regional level, often with the purpose of supporting health and health care improvement in the geographic area

Measure sets by program type

Measure sets by purpose Defining Terms Reporting: measure sets used for performance reporting, this reporting may be public or may be for internal use only Payment: measure sets used for payment distribution to providers (e.g., pay for performance, shared savings, etc.) Reporting and Other: measure sets used for reporting and an additional non-payment purpose, such as tiering providers or contract management Alignment: measure sets resulting from state initiatives to establish a core measure set for the state 16

Measure sets ranged significantly in size [max] 108 measures [avg] 29 measures [min] 3 measures • Note: This is counting the measures as NQF counts them (or if the measure was not NQF-endorsed, as the program counted them).

Table of contents 1. Overview of measure sets 2. Overview of measures 3. Non-standard measures 4. Analysis by measure set type 5. Analysis by measure set purpose 6. Analysis by measure domain/clinical area 7. Intrastate analysis of CA and MA 8. Conclusions / recommendations

2. Overview of measures Goals: • To describe the measures used across the sets and answer the following questions: • Are the measures used primarily standard measures? • To what extent are measures NQF-endorsed? • What are the primary sources of the measures? • Into which domains do most of the measures fall? • To what extent do the measures cover all age ranges? • To assess the extent of alignment across the measure sets • To what extent are measures shared? • What are the most frequently shared measures?

Finding: Many state/regional performance measures for providers in use today • In total, we identified 1367measures across the 48 measure sets • This is counting the measures as NQF counts them or if the measure was not NQF-endorsed, as the program counted them • We identified 509 distinct measures • If a measure showed up in multiple measure sets, we only counted it once • If a program used a measure multiple times (variations on a theme) we also only counted it once • We excluded 53 additional hospital measures from the analysis.

Programs use measures across all of the domains Total measures by domain n = 1367

The distinct measures actually are more evenly distributed across the domains Distinct measures by domain n = 509

Most implemented measures are for adults • But there does not appear to be a deficiency in the number of measures that could be used in the pediatric or the 65+ population. Measures by age group n = 1367

Finding: Little alignment exists across the measure sets • Programs have very few measures in common or “sharing” across the measure sets • Of the 1367 measures, 509 were “distinct” measures • Only 20% of these distinct measures were used by more than one program Number of distinct measures shared by multiple measure sets n = 509 * By “shared,” we mean that the programs have measures in common with one another, not that they are working together.

How often are the “shared measures” shared? Not that often… Only 19 measures were shared by at least 1/3 (16+) of the measure sets Most measures are not shared

Categories of 19 most frequently used measures 7 Diabetes Care 6 Preventative Care 4 Other Chronic Conditions 1 Mental Health/Sub-stance Abuse • Comprehensive Diabetes Care (CDC): LDL-C Control <100 mg/dL • CDC: Hemoglobin A1c (HbA1c) Control (<8.0%) • CDC: Medical Attention for Nephropathy • CDC: HbA1c Testing • CDC: HbA1c Poor Control (>9.0%) • CDC: LDL-C Screening • CDC: Eye Exam • Breast Cancer Screening • Cervical Cancer Screening • Childhood Immunization Status • Colorectal Cancer Screening • Weight Assessment and Counseling for Children and Adolescents • Tobacco Use: Screening & Cessation Intervention • Controlling High Blood Pressure • Use of Appropriate Medications for People with Asthma • Cardiovascular Disease: Blood Pressure Management <140/90 mmHg • Cholesterol Management for Patients with Cardiovascular Conditions • Follow-up after Hospitalization for Mental Illness 1 Patient Experience • CAHPS Surveys(various versions)

Finding: Non-alignment persists despite preference for standard measures Defining Terms Standard: measures from a known source (e.g., NCQA, AHRQ) Modified: standard measures with a change to the traditional specifications Homegrown: measures that were indicated on the source document as having been created by the developer of the measure set Undetermined: measures that were not indicated as “homegrown”, but for which the source could not be identified Other: a measure bundle or composite Measures by measure type n = 1367

In particular, states show a preference for NQF- endorsed measures Percentage of total measures that are NQF- endorsed n = 1367

But looking at the distinct measures, they are clearly willing to use non-NQF measures What are “distinct” measures? • If a measure showed up in multiple measure sets, we only counted it once (e.g., breast cancer screening was counted 30 times in the total measures chart since it appeared in 30 different measure sets; here it is counted once) • If a program used a measure multiple times (variations on a theme) we also only counted it once (e.g., MA PCMH used 3 different versions of the tobacco screening measure; here it is counted once) Percentage of distinct measures that are NQF-endorsed n = 509 29

NCQA (HEDIS) is clearly the most common source of measures Total measures by source n = 1367

But only 16% of the distinct measures come from HEDIS In other words, the 81 HEDIS measures are used by multiple programs. Distinct measures by source n = 509

There is a lot of overlap between NQF and HEDIS but it is not 100% NQF HEDIS

Why HEDIS measures are often the first choice for programs • HEDIS measures are known and trusted • They have been available and in use for a long time • The specifications are widely available and clearly defined • NCQA offers national and regional benchmark information • Although information is at the health plan level, programs can get a sense of how to define “good performance” • They are already used by most health plans, thus providing some information about baseline performance relative to the benchmark • It’s good for the health plans if other programs use HEDIS • If health plan success is being measured on the basis of the HEDIS set, the health plans have an interest in getting other parties to engage in improving scores of those measures • NCQA regularly updates the specifications in response to use, feedback and changes in guidelines • Since another organization is doing this work, it takes the burden off of the program managers

Programs are selecting different subsets of standard measures • While the programs may be primarily using standard, NQF-endorsed measures, they are not selecting the same standard measures • Not one measure was used by every program • Breast Cancer Screening is the most frequently used measure and it is used by only 30 of the programs (63%) Program C Program B Program A Program D Program E

Finding: Even shared measures aren’t always the same - the problem of modification! • Most state programs modify measures • 23% of the identifiable standardized measures were modified (237/1051) • 40 of the 48 measure sets modified at least one measure • Two programs modified every single measure • RI PCMH • UT Department of Health • Six programs modified at least 50% of their measures • CA Medi-Cal Managed Care Specialty Plans (67%) • WA PCMH (67%) • MA PCMH (56%) • PA Chronic Care Initiative (56%) • OR Coordinated Care Organizations (53%) • WI Regional Collaborative (51%)

Do modifications indicate a problem with the measure specifications? • Perhaps… some types of modifications suggest that the measure deserves a closer look: • Adding additional detail to or changing details in the specifications • Eliminating detail from the specifications • Changes in the CPT codes used in the measure specifications • Changes in the source of the data (i.e., from hybrid/clinical records to claims) • However, we found that there are many modifications that programs make that don’t necessarily indicate a fundamental problem with the measure.For example, frequent modifications include: • Reporting only some of the rates/components of the measure (e.g., if the measure has two components: screening and follow-up, they may only do the screening component of the measure) • Narrowing or expanding the age of the population measured • Applying the measure to a new or sub-population • Applying the measure to an alternative setting

Frequency of modification type Note: some of the measures were modified in more than one way and each modification is represented on this chart

Why do organizations modify measures? • To tailor the measure to a specific program • If the program is specific to a subpopulation, then the organization may alter the measure to apply it to the population of interest • To make implementation easier • The systems that the organizations have in place may make an alternative approach to implementing the measure easier • To obtain buy-in and consensus on a measure • Sometimes providers have strong opinions about the particular CPT codes that should be included in a measure in order to make it more consistent with their experiences. In order to get consensus on the measure, the organization may agree to modify the specifications. • Sometimes providers are anxious about being evaluated on particular measure and request changes that they believe reflect best practice

Most frequently modified measures

Most frequently modified measures (cont’d)

Table of contents 1. Overview of measure sets 2. Overview of measures 3. Non-standardmeasures 4. Analysis by measure set type 5. Analysis by measure set purpose 6. Analysis by measure domain/clinical area 7. Intrastate analysis of CA and MA 8. Conclusions / recommendations

Finding: Many programs use non-standard measures Distinct measures by type n =509

Some measures were from “undetermined” sources • 78 of the measures were from “undetermined” sources across 12 measure sets • These measures are in this category due to difficulty interpreting the source documents. • Source was not indicated in the source document • The measure did not include an NQF# • The measure did not use a recognizable measure name • 11 VT ACO utilization measures are considered “undetermined” because the specifications for these measures have not been finalized. They are undetermined from the program’s perspective.

There were 78 undetermined measures across 12 measure sets 69% percent of the undetermined measures come from two sources.

Finding : Many programs create homegrown measures What are “homegrown” measures? Homegrown measures are measures that were indicated on the source document as having been created by the developer of the measure set. If a measure was not clearly attributed to the developer, the source was considered to be “undetermined” rather than “homegrown.” Distinct measures by type n =509

40% ofthe programs created at least one homegrown measure There were 198 homegrown measures across 19 measure sets

Programs create homegrown measures across all domains Homegrown measures by domain n =198

Four basic types of homegrown measures Homegrown measures by type n =198

Some homegrown measures that are specific to one program • 81 programmatic measures: measures related to infrastructure, utilization, geographic access, and program oversight • Percent Eligibility Determination Done at State Level • Child Psychiatrist Count • Provider Satisfaction • These measures are unlikely to become standardized because they are specific to the management or structure of a particular program.

Other homegrown measures may be “reinventing the wheel” • Of these 198 measures, there were 28 measures (14%) for which it was not readily apparent as to why the program created the measures, as these measures appeared to replicate standard measures. • Perhaps the programs were unaware of the availability of the standard measures • Adherence to prescription medications for asthma and/or COPD(could have used NQF #1799: Medication management for people with asthma) • ED appropriate utilization: reduce all ED visits (could have used the ED rates from the HEDIS Ambulatory Care measure) • Emergency Department Visits: Previously Diagnosed Asthma (ages 2 - 17) (could have used NQF# 1381 Asthma Emergency Department Visits) • Fall Prevention (could have used NQF #35 Fall Risk Management)

Kate Reinhalter Bazinsky Michael Bailit September 10, 2013

Kate Reinhalter Bazinsky Michael Bailit September 10, 2013

Presentation Transcript

Portland, Oregon 10 September 2013

September 10, 2013

Bible 10 Journal: 10 September 2013

Warm Up September 10, 2013

Public Hearings September 10, 2013

Tuesday, September 10 th , 2013

Istambul, 10-13 September 2013

September 10 th , 2013

Kate Reinhalter Bazinsky Michael Bailit September 10, 2013

Tuesday September 10, 2013

Superintendent’s Meeting September 10, 2013

Tuesday, September 10, 2013

HANDOUTS September 10, 2013

Tues day , September 10, 2013

September 10, 2013

Skopje, 10 September 2013

10 th September 2013

September 10, 2013

Skopje, 10 September 2013