1 / 16

Finding a Person!

Finding a Person!. Building an algorithm to search for existing people in a system. Rahn Lieberman Manager Emdeon Corp (Emdeon.com). rahn@theliebermans.com http://rahnsworld.com. Understanding the problem. We work with hospitals around the county

javier
Download Presentation

Finding a Person!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding a Person! Building an algorithm to search for existing people in a system Rahn Lieberman Manager Emdeon Corp (Emdeon.com) rahn@theliebermans.com http://rahnsworld.com

  2. Understanding the problem • We work with hospitals around the county • These hospitals (our customers) send us sets of people to work with • We put these people into our internal system for our workflow rahn@theliebermans.com http://rahnsworld.com

  3. Understanding the problem • A little bit of complication: • A person can visit a hospital more than one time. • When this happens, we need to recognize the person so we don’t load them repeatedly • And more complication: - The person can then visit another hospital, and we still need to recognize them rahn@theliebermans.com http://rahnsworld.com

  4. Our Environment • We get data from customers in batch files and through real-time (HL7) interfaces • Data base be entered into our system via our interfaces or manually • We process upwards of a quarter million records a day, importing approximately 10K new records. • Electronic entry accounts for approximately 85% of the data entry rahn@theliebermans.com http://rahnsworld.com

  5. Bragging • We get reports of duplicate data being entered approximate once every 2-3 months. • On investigation, these are almost always user errors. • These are so rare, I don’t have statistics on actual number of errors.  rahn@theliebermans.com http://rahnsworld.com

  6. How’s it Done? • What to Know: • Know your data! • Know that it will take some time to get it right • And of course, know your users • The data is the actual elements of what you’re working with. Names, birthdates, etc. rahn@theliebermans.com http://rahnsworld.com

  7. How’s it Done? (cont.) • Start with your basic object first. In my case, it’s a person • Take stock of common elements your object that should always be know, and are commonly part of your data: • First Name, Last Name, Middle Initial • Date of Birth • Gender • Is that enough to uniquely ID someone? • If not, keep adding, but realize you may not get everything you want rahn@theliebermans.com http://rahnsworld.com

  8. How’s it Done? (cont.) • How about adding an address and a social security number? • That gets close to giving enough information. • What? SSN’s are unique! • Not everyone has one, or know what it is • People may share them or make them up rahn@theliebermans.com http://rahnsworld.com

  9. How’s it Done? (cont.) Do the same thing for other items in your domain. In my case, hospitals have standard data elements: • Account Number, Medical Record Number • Unique identifier in our system to distinguish the hospitals Once you have these core data elements, setup a query in your database for them, allowing elements to by null • You’ll need to know how data is stored (null versus empty strings, default values, etc.) rahn@theliebermans.com http://rahnsworld.com

  10. How’s it Done? (cont.) • Search and weigh the results • Go through all reasonable variations of your data, and collect the PersonID found in your system if one is found. • Create a dictionary of all values found • After all values are collected, sort them and count up the number of like PersonID’s • The PersonID with the most hits is most likely the correct person • Note that if none are found, then you end up with a “0” as the most common PersonID. Be sure to account for this. rahn@theliebermans.com http://rahnsworld.com

  11. Example of weighing • First + Last + DOB + Account Number + CustomerID: 20 • Last + SSN: 20 • Last + Account Number + DOB: 20 • First + Medical Record: 19 • First + Address + CustomerID: 19 • Total count of 20: 3 • Total count of 19: 2 • Our person is PersonID 20 rahn@theliebermans.com http://rahnsworld.com

  12. More details • The key is the number of searches, and making sure the searches provide unique results. • In my production system, we do 30+ searches • F+M+L names + DOB + Address information • SSN searches with first and last names (need to account for last name changes) • Customer searches based on our entire customer hierarchy • I even go as far as cleaning the data to remove dashes, hyphens and other possible data that may not be entered consistently rahn@theliebermans.com http://rahnsworld.com

  13. Testing • Search for a know person in your database (i.e. PersonID 1, where you know all the data about them) • More complicated tests: • Pull X number of records, remove some of the data from them, and try to find them again. • This works well because you always know what you’re expected to find • Sorry, SQL Compact doesn’t support TOP, so I can’t demonstrate.  rahn@theliebermans.com http://rahnsworld.com

  14. Known issues • Babies are hard to match because they don’t have SSN’s and they don’t have names • Most hospitals will enter a baby with a generic name, Baby Boy Smith, then change it in their system at a later date(after paperwork has been filed) • If a Senior and a Junior live at the same address, and we don’t have both of them in our system already, we may get a false result • Speed! 30 queries take a long time to run. I’m looking into ways to speed this up • Running multiple instances of the import/search at the same time will fail. We have safeguards against this in place rahn@theliebermans.com http://rahnsworld.com

  15. Demo • This sample data is ALL FAKE! • Do not think these are real people. If they are, it’s purely a coincidence • Social Security numbers are the phone numbers minus 1 digit • This is not my full production system. It’s been simplified to show proof of concept, and it uses SQL Compact for ease of demonstrating offline rahn@theliebermans.com http://rahnsworld.com

  16. Questions • Slides and code available at http://rahnsworld.com rahn@theliebermans.com http://rahnsworld.com

More Related