finding a person n.
Skip this Video
Loading SlideShow in 5 Seconds..
Finding a Person! PowerPoint Presentation
Download Presentation
Finding a Person!

Loading in 2 Seconds...

play fullscreen
1 / 16

Finding a Person! - PowerPoint PPT Presentation

  • Uploaded on

Finding a Person!. Building an algorithm to search for existing people in a system. Rahn Lieberman Manager Emdeon Corp ( Understanding the problem. We work with hospitals around the county

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Finding a Person!' - javier

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
finding a person
Finding a Person!

Building an algorithm to search for

existing people in a system

Rahn Lieberman


Emdeon Corp (

understanding the problem
Understanding the problem
  • We work with hospitals around the county
  • These hospitals (our customers) send us sets of people to work with
  • We put these people into our internal system for our workflow

understanding t he problem
Understanding the problem
  • A little bit of complication:
  • A person can visit a hospital more than one time.
  • When this happens, we need to recognize the person so we don’t load them repeatedly
  • And more complication:

- The person can then visit another hospital, and we still need to recognize them

our environment
Our Environment
  • We get data from customers in batch files and through real-time (HL7) interfaces
  • Data base be entered into our system via our interfaces or manually
  • We process upwards of a quarter million records a day, importing approximately 10K new records.
  • Electronic entry accounts for approximately 85% of the data entry

  • We get reports of duplicate data being entered approximate once every 2-3 months.
  • On investigation, these are almost always user errors.
  • These are so rare, I don’t have statistics on actual number of errors. 

how s it done
How’s it Done?
  • What to Know:
    • Know your data!
    • Know that it will take some time to get it right
    • And of course, know your users
  • The data is the actual elements of what you’re working with. Names, birthdates, etc.

how s it done cont
How’s it Done? (cont.)
  • Start with your basic object first. In my case, it’s a person
  • Take stock of common elements your object that should always be know, and are commonly part of your data:
    • First Name, Last Name, Middle Initial
    • Date of Birth
    • Gender
  • Is that enough to uniquely ID someone?
    • If not, keep adding, but realize you may not get everything you want

how s it done cont1
How’s it Done? (cont.)
  • How about adding an address and a social security number?
  • That gets close to giving enough information.
  • What? SSN’s are unique!
      • Not everyone has one, or know what it is
      • People may share them or make them up

how s it done cont2
How’s it Done? (cont.)

Do the same thing for other items in your domain.

In my case, hospitals have standard data elements:

  • Account Number, Medical Record Number
  • Unique identifier in our system to distinguish the hospitals

Once you have these core data elements, setup a query in your database for them, allowing elements to by null

  • You’ll need to know how data is stored (null versus empty strings, default values, etc.)

how s it done cont3
How’s it Done? (cont.)
  • Search and weigh the results
    • Go through all reasonable variations of your data, and collect the PersonID found in your system if one is found.
    • Create a dictionary of all values found
    • After all values are collected, sort them and count up the number of like PersonID’s
    • The PersonID with the most hits is most likely the correct person
    • Note that if none are found, then you end up with a “0” as the most common PersonID. Be sure to account for this.

example of weighing
Example of weighing
  • First + Last + DOB + Account Number + CustomerID: 20
  • Last + SSN: 20
  • Last + Account Number + DOB: 20
  • First + Medical Record: 19
  • First + Address + CustomerID: 19
  • Total count of 20: 3
  • Total count of 19: 2
  • Our person is PersonID 20

more details
More details
  • The key is the number of searches, and making sure the searches provide unique results.
    • In my production system, we do 30+ searches
    • F+M+L names + DOB + Address information
    • SSN searches with first and last names (need to account for last name changes)
    • Customer searches based on our entire customer hierarchy
    • I even go as far as cleaning the data to remove dashes, hyphens and other possible data that may not be entered consistently

  • Search for a know person in your database (i.e. PersonID 1, where you know all the data about them)
  • More complicated tests:
    • Pull X number of records, remove some of the data from them, and try to find them again.
    • This works well because you always know what you’re expected to find
    • Sorry, SQL Compact doesn’t support TOP, so I can’t demonstrate. 

known issues
Known issues
  • Babies are hard to match because they don’t have SSN’s and they don’t have names
  • Most hospitals will enter a baby with a generic name, Baby Boy Smith, then change it in their system at a later date(after paperwork has been filed)
  • If a Senior and a Junior live at the same address, and we don’t have both of them in our system already, we may get a false result
  • Speed! 30 queries take a long time to run. I’m looking into ways to speed this up
  • Running multiple instances of the import/search at the same time will fail. We have safeguards against this in place

  • This sample data is ALL FAKE!
  • Do not think these are real people. If they are, it’s purely a coincidence
  • Social Security numbers are the phone numbers minus 1 digit
  • This is not my full production system. It’s been simplified to show proof of concept, and it uses SQL Compact for ease of demonstrating offline

  • Slides and code available at