Has the Data Cleaning Hidden the Cost Effect?

Has the Data Cleaning Hidden the Cost Effect? Or Has the Baby Gone with the Bath Water?

Introduction • ABF model development is impeded by • Inconsistency in health product definition and • Inconsistency in attribution of incurred cost. • This leads to the removal of “bad” data from analyses aimed at establishing model parameters. • By striving for reliability the analyst may sacrifice validity especially in the assessment of effects (such as Indigenous status) that may markedly affect outcomes. • This paper presents a meta-analytic method to testing if effects have been concealed by data editing.

Method Part 1 • Inconsistencies in product definition and errors in cost allocation tend to occur at establishment level. Therefore studies within establishment are less affected. • The original unedited establishment data provides the basis for validly assessing cost effects such as Indigenous Status. • Phase 1 • When data from many establishments are available, the findings at establishment level may be pooled in a meta-analysis: • Partition the data in each establishment between contacts with and without the study characteristic (e.g. Indigenous Status) • Form the ratio of the cost of each partition to its modelled cost, and then the ratio of these to each other. • Regard the occurrence of a ratio of ratios greater than 1 as a success in a Binomial trial. • Pool the results across establishments and test the effect hypothesis against the Binomial distribution. • Record the number of Successes.

Method Part 2 • Phase 2 • Working with a copy of the data: • Set the Status of all contacts to the value 0. • Within each Hospital by Case-type Cell randomly assign Status value 1 to the same number of contacts that had Status value 1 in the original data. • Repeat Phase 1 above with these modified data. • Phase 3 Repeat Phase 2 a sufficiently large number of times • Final Phase Compare the outcome of Phase 1 with the distribution of outcomes generated in Phase 3.

Exemplar System • The Casemix System chosen to demonstrate the approach is one for Emergency Care (EC). • patient level costing data • from the National Hospital Cost Data Collection, Round 15 (NHCDC15). • Urgency Related Groups v1p3 (URGs) and • The work was conducted for the Independent Hospital Pricing Authority in Australia. • The model was based on presentations moderated by Indigenous Status.

Demonstration Part 1 First summarise the PLC data:

Demonstration Part 2 Then Calculate the Ratios and Compare with 1.

Demonstration Results • Count the number of trials (Hospital by Case-type combinations in test data) – Observed 4,734 (4,712 for cleaned data) • Determine the p=0.01, 2-tailed “Acceptance Range” (AR) for Bin([.],4,734,0.5) – [2278,2455] ([2259,2452] for cleaned data) • Sum the Trial Outcome (Ratios of Ratios greater than 1) – Observed 1,847 (1,853 for cleaned data) • Observed value is outside the AR • Provisionally Reject the Hypothesis “No Adjustment Need” so go onto Phase 2 and 3 of method. • We find that the result is not unusual within the set of outcomes obtained through randomization. • Reverse the decision and determine that the need for an adjustment has not been confirmed.

Discussion • The Phase 1 analysis appeared to confirmed an effect of Indigenous status in the IHPA model for emergency services. However its direction corresponds to that expected in long tailed cost distributions, irrespective of any Status effect. • An extra use of data in the second summary form is to calculate the level of adjustment needed. • This is calculated by using: • the inverse variance weighted average of Ratios of Ratios (the variance estimate is (n+1)/(n1*(n-n1)) where n is total presentations and n1 Indigenous presentations. • This returns an uplift multiplier estimate of 1.0035 (1.0046 cleaned data) • The apparent contradiction between the Binomial finding and the estimated uplift indicated the need for the later Phase analyses. • Note however, when the IHPA regression model approach is followed on the cleaned data the estimate is 1.0398 very close to the 1.04 value determined by IHPA

Conclusion Data editing has not affected the validity of the Indigenous Status Estimate. Despite its simplicity, the Phase 1 method allows cross validation of findings (or otherwise), even when the model development used edited data; provided that the cost ratios have Bell shaped probability curves with median close to mean. The Phase 1 Binomial results allow assessment of the Bell curve assumption. If it does not hold then results from the (Empirical) Conditional Likelihood approach (Phase 2 and 3) must be used.

Postscript 1 • Note that the Binomial test result suggests that Indigenous Status is associated with lower spending within Hospital by URG cell. • When Hospital ratios rather than Hospital by URG ratios are used: • 107 trials, 34 successes (p=0.0001 ) uplift multiplier estimate of 0.980 • Very similar outcome to the more detailed cell analysis.

Postscript 2 • When URG ratios rather than Hospital by URG ratios are used: • 65 trials, 27 successes (p=0.1073) uplift multiplier estimate of 1.031 • Quite different outcome from the more detailed cell analysis. • In line with IHPA determination • In line with iterative regression estimates • So what is the real story here? • The answer is to be found in the shapes of the distributions of cost ratios.

Has the Data Cleaning Hidden the Cost Effect?