Multivariate detection of aberrant billing an evaluation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

Multivariate Detection of Aberrant Billing: An Evaluation PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Multivariate Detection of Aberrant Billing: An Evaluation. Maharaj Singh, Ted Wallace & Martin Schrager National Government Services, Inc. Outline of the study. Outlier defined Multivariate method for detecting Outlier Detecting outlier billing providers An evaluation of the methodology

Download Presentation

Multivariate Detection of Aberrant Billing: An Evaluation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multivariate detection of aberrant billing an evaluation

Multivariate Detection of Aberrant Billing: An Evaluation

Maharaj Singh, Ted Wallace & Martin Schrager

National Government Services, Inc.


Outline of the study

Outline of the study

Outlier defined

Multivariate method for detecting Outlier

Detecting outlier billing providers

An evaluation of the methodology

Other factors considered for future application

Conclusion


An outlier

An Outlier

  • An outlier is not ‘Outlier’. May be we haven’t yet found the ‘right’ Distribution.


Multivariate method

Multivariate Method

  • We used a Mahalanobis distance as multivariate vector corresponding to each observation in the data set.


Mahalanobis distance

Mahalanobis Distance

  • Mahalanobis distance is a distance measure introduced by Prasanta Chandra Mahalanobis in 1936.

  • It is based on Correlation between variables by which different patterns can be identified and analyzed.

  • It is a useful way of determining similarity of an unknown sample set to a known one


Mahalanobis distance1

Mahalanobis distance

  • Mahalanobis distance


Mahalanobis distance and multivariate outliers

Mahalanobis distance and Multivariate Outliers

  • Mahalanobis D2 is a multidimensional version of a z-score. It measures the distance of a case from the centroid (multidimensional mean) of a distribution, given the covariance (multidimensional variance) of the distribution.


Multivariate detection of aberrant billing an evaluation

D2

  • A case is a multivariate outlier if the probability associated with its D2 is 0.05 or less. D2 follows a chi-square distribution with degrees of freedom equal to the number of variables included in the calculation.

  • Mahalanobis D2 requires that the variables be metric, i.e. interval level or ordinal level variables that are treated as metric.


Mahalanobis distance from ellipsoid

Mahalanobis Distance from Ellipsoid

  • Mahalanobis distance measure is based on correlations among the variables by which different patterns can be identified and analyzed.

  • The region of constant Mahalanobis distance around the mean forms an ellipsoid when more than two variables are used.


Multivariate trimming

Multivariate trimming …

  • The χ2 plot for multivariate data is not resistant to the effect of outliers.

  • A few discrepant observations can affect the mean vector, and can potentially influence the outcome.

  • In order to avoid the effect of a few discrepant observations, we used multivariate trimming which involved an iterative process of setting aside the observations with largest squared distance and the trimmed statistics are computed from the remaining observations.

  • At the end of this iterative process, the new squared distance values are computed using the robust statistics.


Chi square plot for the dataset

Chi Square plot for the dataset


The data set paid claims

The Data Set: Paid Claims


Indices for the id of observations

Indices for the Id of observations

  • Billing providers

    • Location,

    • Size

    • Specialty

  • HCPCS used

  • Primary Diagnoses


Matrix of utilization variables amount

Matrix of Utilization Variables: Amount

  • Cost

    • Charges Billed

    • Charges denied

    • Reimbursement


Matrix of utilization variables rate

Matrix of Utilization Variables: Rate

  • Rate

    • Reimbursement per beneficiary

    • Service Units per beneficiary

    • Service units per service dates per beneficiary


Matrix of utilization variables volume

Matrix of Utilization Variables: Volume

  • Volume

    • Number of claims

    • Number of beneficiaries

    • Number of service units rendered

    • Number of service days


The cost medicare trust fund

The cost: Medicare trust fund $$$

  • For each observation the amount paid is a function of the rate and the volume.

  • However for each observation Id, the rate and volume variables are also highly inter-correlated.


Methodology for paid claim data set

Methodology for Paid Claim Data Set


Data steps

Data Steps

  • The line-level (detailed) paid claim data was summarized by id (provider-HCPCS combination) with summary of the utilization variables (cost, rate and volume).


Principal components

Principal Components

  • The variables in the matrix of the paid claims dataset were converted into principal components.

  • The distance squared was computed as unique sum of squares principal components.


Multivariate trimming1

Multivariate Trimming

  • The iterative process of multivariate trimming was used.


D 2 and expected chi square value

D2 and Expected Chi Square Value

  • Corresponding to the square distance the expected chi square value along with its probability were computed.


Outlier observations

Outlier Observations

  • The observations with probability < .05 are treated as outliers and are flagged.

  • The flagged observations are treated as candidates for probe by medical review and/or treated as potential CERT errors and referred to the Provider Education Unit.

Outlier


Outlier observations prioritized

Outlier Observations Prioritized

  • Finally the outlier observation were prioritized by the magnitude of distance measure, expected chi-square value and the probability associated with measure.


Evaluation of outlier classification

Evaluation of Outlier Classification

  • Once each observation in the dataset has been classified as an outlier or non-outlier by using chi-square distribution, we used logistic regression to find out the estimate of the goodness of fit of the model.


C statistics

C Statistics

  • In order to find out how accurately we were able identity the outlier observations we used C Statistics.


C statistics1

C Statistics…

  • The value of c statistics varies from 0.5 ( randomly assigning to one of the other category) to 1.0 where the observations are correctly assigned to the categories.


An example from paid claim dataset

An example from Paid Claim Dataset


Think about some random numbers

Think about Some Random Numbers??

4 - 100 - 40 - 60


Outlier proportion of ngs utilization

Outlier Proportion of NGS Utilization

  • Provider HCPC Line 03.46%

  • Provider Counts99.49%

  • Total Reimbursement37.90%

  • Total Units 58.78%

  • Provider HCPC Benes 44.21%

  • Provider HCPC Claims48.21%


Model evaluation

Model Evaluation

  • The outlier model for NGS data was evaluated by using goodness of fit test.

  • The NGS combined data set has 930,260 Provider HCPC Lines.

  • Of the total lines there were 32,145 were outliers lines.

  • The Chi Square for the model was 344144.04 with Probability being < 0.0001.


C statistics2

C statistics

  • Association of Predicted Probabilities and Observed Responses

    Percent Concordant 94.9%

    Percent Discordant3.6%

    Percent Tied 1.5%

    c statistic 0.956


Past and future application

Past and Future Application


Current application

Current Application

  • Used as a single factor in determining multivariate statistical outliers

    • Problem areas

    • HCPC codes

    • Individual Providers


Current application1

Current Application

  • Positives

    • Confidence (statistically valid methodology)

    • Consistent methodology regardless of problem area

    • Lack of clinical bias

  • Negatives

    • Difficult to interpret

    • Volume of provider/HCPC combinations required for valid analysis

    • Lack of clinical bias


Future application

Future Application

  • Using the squared distance as a factor in determining outlier problem areas

  • Using the squared distance as a factor in determining the aberrancy index of a provider


Future application problem areas

Future Application – Problem Areas


Future application problem providers

Future Application – Problem Providers


The multivariate model

The multivariate model

  • By using multivariate model only 4% of total Provider-HCPC combinations lines were identified as outliers.

    • However the 4% of the total lines have captured almost 100% of the NGS providers and questioned their 40% of their payment in the Quarter 4 of 2007.


Testing of the model as classifier

Testing of the model as classifier

  • Using multivariate model with multivariate trimming we were able to identify each observation (provider-hcpcs combination) to be as outlier or non-outlier.

  • Using this method we were able identify outliers with a very high concordance ( 94.6%).


Conclusion

Conclusion

  • We used multivariate statistical method to identify aberrant billing and utilization in the claim data set and tested the validity of the method by using logistic regression.

  • We also noted that statistical method alone is not enough and we need to add other factors to add value to the process of identifying the problem areas as well finding the high value target.


  • Login