(De-Identified) Record Linkage

(De-Identified) Record Linkage DongqiuyePu, AshrafFarrag, JavedMostafa

Background • Identify duplicates in a file or across files • AKA: Object identification, data cleaning, entity resolution, etc….

Motivation • Lack of unique identifiers • Variations of spelling, misspelling, typo…

For Instance… (A) (B)

Methods In a Nutshell • Deterministic matching: straightforward, no human review needed, but suffer low recall • Approximate matching: harder to implement, human review needed, higher recall

Research Plan • Exact matching • Fuzzy matching for the rest

Evaluating accuracy of anonymous record linkage • Evaluate collision rate of hashing algorithm (most likely will be ZERO)

(De-Identified) Record Linkage

(De-Identified) Record Linkage

Presentation Transcript

1. Pressure Ulcer identified

Identified Patient: Mom

DomainKeys Identified Mail (DKIM)

IDENTIFIED WORKPLACE HAZARDS

Themes Identified Last Class

Identified Priority Challenges

JESUS IDENTIFIED

anonymous, coded, and de-identified data

Identified SEQ regions

Identified Prints

Concern / Potential Need Identified

Summary of Identified Issues

OPHIR IDENTIFIED

Identified indicators Albania

DomainKeys Identified Mail (DKIM)

Background / Problems Identified

SALT Identified Themes

Need for policy identified

Identified Themes

Data Identified Noncompliance

CHALLENGES IDENTIFIED

BHEPP Project Screen Shots (De-identified Data)-1