1 / 43

Issues in Deterministic and Probabilistic Record Linkage

Issues in Deterministic and Probabilistic Record Linkage . Scott DuVall Salt Lake City VHA MC. the age of. information. informatician. information =. information = . information. Linkage Adds Information. Linkage Corrects Errors. Missing information affects patient care 1.

rowdy
Download Presentation

Issues in Deterministic and Probabilistic Record Linkage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Issues in Deterministic and Probabilistic Record Linkage Scott DuVall Salt Lake City VHA MC

  2. the age of information

  3. informatician information = information = information

  4. Linkage Adds Information

  5. Linkage Corrects Errors

  6. Missing information affects patient care1 • Transitions in care • cause breakdown in communication2 1 Stiell et al. Prevalence of information gaps in the emergency department and the effect on patient outcomes. Cmaj 2003;169(10):1023-8. 2 Coleman et al. Lost in transition: challenges and opportunities for improving the quality of transitional care. Ann Intern Med 2004;141(7):533-6.

  7. Resolving duplicates can cost $60 per case.1 1Thornton SN, Hood SK. Reducing Duplicate Patient Creation Using a Probabilistic Matching Algorithm in an Open-access Community Data Sharing Environment. Proc AMIA Symp 2005:1135.

  8. “between $0.30 and $0.40 of every dollar spent on health care is wasted on overuse, under use, misuse, duplication, system failures, unnecessary repetition, poor communications and inefficiency.”1 1Reid PP, Compton WD, Grossman JH, Fanjiang G. Building a Better Delivery System: A New Engineering/ Health Care Partnership. National Academies Press, 2005:99.

  9. Key element of health care information exchange and interoperability, estimated to be able to reduce costs $77.8 billion annually.1 1Walker J, Pan E, Johnston D, Adler-Milstein J, Bates DW, Middleton B. The value of health care information exchange and interoperability. Health Aff (Millwood). 2005 Jan-Jun;Suppl Web Exclusives: W5-10-W5-18.

  10. Record Matching • Many systems have record matching software. • Errors still exist • 50% missed in CDC Survey1 • 5% missed in 1.5 million records = 75,0002 1 User Manual for the CDC Deduplication Evaluation Toolkit 2 Snow LA, DuVall SL. Clinical Data Exchange Through A Looking Glass: A Gray-Box Approach To Record Linkage. NLM 2005.

  11. Old Technology

  12. Misunderstood Technology

  13. Misunderstood Technology

  14. probability score Score Is Not Probability

  15. Information is not Used

  16. Name + Date of Birth + Social Security Number MPI

  17. MPI

  18. Deterministic Linkage • IF r1.social_security_number = r2.social_security_number THEN match. 2) IF SoundexCompare(r1.last_name, r2.last_name) AND SoundexCompare(r1.first_name, r2.first_name) AND EditDistance(r1.birth_place, r2.place)<2 AND r1.birth_date = r2.birth_date AND r1.multiplicity = r2.multiplicity AND r1.birth_order = r2.birth_order THEN match.

  19. IF contains(0..9) THEN NUMBER IF contains(North, South, East, West) THEN DIRECTION IF contains(Street, Road, Lane, Drive, ...) THEN STREET_TYPE ELSE STREET_NAME IF (NUMBER = NUMBER) AND (DIRECTION = DIRECTION) AND (STREET = STREET) AND (STREET_TYPE = STREET_TYPE) THEN MATCH

  20. Probabilistic Linkage Each field given AGREEMENT and DISAGREEMENT weight Weight proportional to the field’s DISCRIMINATION and RELIABILITY Many more parameters, possibility of better matching

  21. Record Matching Understand your Data + Understand Mistakes in your Data Good Strategy for Linkage MANUAL REVIEW

  22. Understanding the Data • Compare characteristics of records in the duplicate subset with records in the full enterprise data warehouse • Describe instances where records in the duplicate subset are not typical of the database at large • Provide considerations for others looking at duplicate records in master patient indexes

  23. Extension of the Probabilistic Model for Approximate Field Comparators

  24. Probabilistic Model Field in Record A = Field in Record B Agreement Weight Field in Record A ≠ Field in Record B Disagreement Weight

  25. M – probability that field matches in dup pair U – probability that field matches in non-dup pair

  26. Agreement Weight LOG(M/U) Disagreement Weight LOG(1-M/1-U)

  27. Field in Record A ≈ Field in Record B ?

  28. Approximate Comparator Edit Distance ED( Johnathan, Jonathan ) = 1

  29. Approximate Comparator Weight LOG(Mδ /Uδ)

  30. Mδ – probability that field approximately matches by δin dup pair Uδ – probability that field approximately matches by δin non-dup pair

  31. Dups Non-Dups Load and randomize training set Initial Parameters Classify with estimated parameters Estimate Dups and Non-Dups Update Parameters

  32. Dups Non-Dups Load and randomize training set Updated Parameters Classify with updated parameters Re-estimate Dups and Non-Dups Update Parameters

  33. Dups Non-Dups Load and randomize validation set Training Set Parameters Classify with training set parameters Classified Dups and Non-Dups

  34. questions?

More Related