1 / 25

Health Equity Analytics Solution

Health Equity Analytics Solution . -Team PowderQuants Team lead: B en T aylor Analyst: Justin Powell bentaylorche@gmail.com. Outline. Define the objective Data Formatting Data Clustering Predictive Analytics Model Solution ROI Looking Forward. Define the objective Brief background.

nevan
Download Presentation

Health Equity Analytics Solution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Health Equity Analytics Solution -Team PowderQuants Team lead: Ben Taylor Analyst: Justin Powell bentaylorche@gmail.com

  2. Outline • Define the objective • Data Formatting • Data Clustering • Predictive Analytics Model • Solution • ROI • Looking Forward

  3. Define the objectiveBrief background • Descriptive Analytics • This is the most basic solution. Nothing more than a graphical visualization resting on top of a database. If data visualization is needed there are many plug and play vendors such as Tableau, Domo, etc… • Predictive Analytics • Using the data from the descriptive analytics, can a model be built to predict account spend rate? This requires a background in modeling and proper metrics for success to ensure overfitting is not an issue. • Prescriptive Analytics • Rather than just firing a prediction or threshold to react to the data, prescriptive analytics attempts to use the model for insight to change the future outcome. An example of this would be targeting chronic diabetic customers to reduce the risk of limb amputation.

  4. Define the objective • The problem objective is to develop a model that can predict account balance risk for preemptive notification. • Challenge • Focusing too much on the end goal can distract and confuse. • Simplifying the problem into tractable pieces reveals where the focus of the algorithm should be: Predicting the likely spend rate of each individual. The rest of the math after that is simple. Fund ? Inputs Ok Account balance Expected contribution Fund ? Inputs Ok Predicted spend

  5. Define the objective • Common pitfalls with predictive analytics • Default Objective is incorrect • Many novice users will use default algorithms without much thought into the algorithms underlying objective. This can cause problems if the objective is simply tied to an overall error such as (R^2, RSME, etc…) which is not robust to outlier influence, scaling issues, or give the end user any sense of model confidence. [Powerquants use 3 metrics for comparison] • Overfitting => Solution confidence / quality • “Any solution without an associated confidence is no solution at all.” An R^2 of 1 can be provided given enough input variables into a model, but offers poor predictive power beyond the training set. Cross validation / bootstrapping can assist to aid in model confidence assessment. [Powerquantsprovide robust confidence metrics] Inputs Predicted spend rate ?

  6. Data Formating • Joining up the data • unique [MemberID.dependent x OPT claim] • Combine all other data into single table keyed off of either claimID or memberID

  7. Clustering Data • Looking at the sparse raw data (left) it is nearly impossible to see the value. Clustering using a self organized map (right) allows for areas of interest to come to life. Now procedures with high use counts among members with correlations between other procedures are readily visible along the diagonal. Unique CPT codes (reordered) Unique CPT codes Unique member Shuffle organize ? Unique member (reordered) Sparse!

  8. A closer look cool….

  9. Cluster + age underlayAge specific clusters can be visualized as well

  10. Cluster drill down by opt codes These codes were not available to us, but I promise you they are closely related  and provide insight

  11. Cluster drill down by opt codes 99051: Service(s) provided in the office during regularly scheduled evening, weekend, or holiday office hours, in addition to basic service

  12. Cluster drill down by opt codes

  13. Cluster drill down by opt codes These codes were not available to us, but I promise you they are closely related  and provide insight into spending behavior

  14. Cluster drill down by opt codes

  15. Training / ValidationAll model building should try and utilize some sort of holdout set for confidence assessment. Validate 30% Train 70%

  16. Define Bucket ClassificationsLooking at average daily spending behavior across all members we can create a histogram and define classification buckets. Wellness (intervention) low med med-high high extreme 99.9th 50th 75th 95th 2 4 5 1 3 Average daily spend ($USD)

  17. Simple baseline to compare against • Assume training bucket classification persists • Validation results: • Absolute Prediction Error: • Mean = 7.43$USD/day, median= 1.40 $USD/day • Hit rate: • 49% match bucket, 84% within 1 bucket, 98% within 2 buckets • Penalty error (over-estimate 1/2, under x2): • We would rather over estimate than under (this allows potential intervention) • 1.07

  18. Bayesian Bootstrap Bootstrap 100x Female Male CPT post probability 0-10yrs YM 0-10yrs YF CPT in 10-40yrs AM 10-40yrs AF >40yrs EM >40yrs EF Cumulative price increase >5% probability

  19. Flowchart Training Transform c3.4xlarge All historical data ETL YF YM historical data For candidate i Prediction AF AM Spend Rate Category EM EF historical contributions Process post-prob-matrices for each partition ($0.840/hr! intermittent use) balance Simple logic Education True Educate False

  20. LAUNCH AWS Demo • Here I will launch my AWS instance and run the demo showing the distributed Bayesian bootstrap code running in memory on 16 cores and compare that to my local machine rate (~160hrs).

  21. Application Med risk, ok Low risk Account high, ok Medium risk Account is running out: Fund Here is an subsample of real customer account balance estimates. We have highlighted interesting accounts to demonstrate different behaviors. The top line shows a low risk individual that continuously funds their account, even if the model determines they were high risk for healthcare cost they still would not trigger a funding notification because their balance is so high. Funding notifications are only sent out if the account is at risk of being empty within the next few months based on spending rate predictions coupled with recent funding behavior.

  22. Investment • Engineering cost • <$20,000 for consultants to setup AWS infrastructure and provide full integration • CLOUD Cost • (depends on training frequency and optimization) • Lowest cost could be $100-200/month in cloud resources assuming 10hrs/month training + wireframe infrastructure (ETL, email, etc..) • Highest cost could go up to $1000/month for optimization and frequent training

  23. Return • Assuming a 5% reduction in health care costs • Reduction will come from wellness awareness and insight into clustered medical spending (discovered risks). With Bayesian bootstrapping you are essentially giving your customers rich tailored probability maps, do what you want with that information. (i.e. I am going in for X surgery, what are the risks or complications, and what are the costs of those risks for my demographic?) • Patient responsibility: • $21,979,894.32*0.05=$1,098,994.72 savings • Negotiated Price: • 144633170.61*0.05=$7,231,658.53 savings

  24. Future Opportunities • Operations make this type of problem a GPU candidate. • Running on GPUs can offer anywhere from 10-100x speed up. This could be a cost savings opportunity if frequent trainings are needed. • Bucket thresholds can be optimized • 50th,75th, etc… thresholds are arbitrary, can be refined for greater predictive power. • More specific age/gender Bayesian maps can be created, including location given enough data. • Increase resolution, more age groups, smarter age transitions. Also including health assessment data would improve this type of risk clustering. • Clustering can be magnified for easier visualization + automated cluster threshold data mining methods can be used to automate insight mining in the clusters. • These clusters provide a wealth of knowledge on common procedures and the largest pain points in the sector. Spending time to develop cluster evaluation techniques would be worthwhile.

  25. Code location • git clone https://bitbucket.org/bentaylorche/heqdatacomp.git • Code is partial, I will check in the AWS demo at the presentation.

More Related