1 / 74

De-Identification

De-Identification. Privacy in Organizational Processes. Patient medical bills. Patient information. Hospital. Insurance Company. Drug Company. Aggregate anonymized patient information. Advertising. Complex Process within a Hospital. PUBLIC. Patient.

selina
Download Presentation

De-Identification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. De-Identification

  2. Privacy in Organizational Processes Patient medical bills Patient information Hospital Insurance Company Drug Company Aggregate anonymized patient information Advertising Complex Process within a Hospital PUBLIC Patient

  3. Transfer and Use Between Organizations Achieve organizational purposewhile respecting privacy expectations in the transfer and use of personal information (individual and aggregate) within and across organizational boundaries

  4. We Use the Health Data for Research in Many Aspects

  5. Two Swords in Health Research • Informed Consent Form • De-Identification • 目前臨床研究兩種方式,可以說服 IRB 和社會大衆,它們是有盡到人身和資料保護的方法 • 第一種方式,就是受試者簽署 ICF,讓受試者(被研究者)預先或有條件的放棄他們本應擁有權力的隱私。 • 第二種方式,是讓研究者無法透過資料的比對獲得受試者個人的病歷資料,進而傷害受試者的隱私。

  6. HIPAA Background • Commercial Healthcare Insurance • Pharmaceutical Benefit Maker (Intruder) • Health Maintain Organization holding hospitals’ stock share or M&A hospitals • Research Fraud and Scandal of Clinical Trials • Who can market our medical record data?

  7. Health Insurance Portability and Accountability Act • HIPPA, enacted by US Congress in 1996 • Title I: Health Care Access, Portability, and Renewability • Title II: Preventing Health Care Fraud and Abuse; Administrative Simplification; Medical Liability Reform • Privacy Rule • Transactions and Code Sets Rule • Security Rule • Unique Identifiers Rule • Enforcement Rule • HITECH Act: Privacy Requirements

  8. ICF 的範本(KUSO版) 願參加 XXX 藥品的臨床試驗 如有生死,各安天命 受試者 X X X

  9. De-Identification and Re-Identification • 婦產科醫師 • 慈濟醫院 • 人體試驗審議委員會 • 企業管理博士 • 醫管系助理教授 • 林錦鴻

  10. What items are prohibited for disclosure ?

  11. HIPAA Privacy Rule and Research with De-identified Information (1) (1) Names(2) All geographic subdivisions smaller than a State, including: street, city, county, precinct, zip code - the first three digits of the zip code can be used if this geocode includes more than 20,000 people. If such geocode is less than 20,000 persons, "000" must be used as the zip code.(3) All elements of dates (except year) related to an individual, including birth date, admission date, discharge date, date of death. For individuals > 89 years of age, year of birth cannot be used - all elements must be aggregated into a category of 90 and older.

  12. HIPAA Privacy Rule and Research with De-identified Information (2) (4) Telephone numbers(5) FAX numbers(6) Electronic mail addresses(7) SSN(8) Medical record numbers(9) Health plan beneficiary numbers(10) Account numbers(11) Certificate/license numbers (12) Vehicle identifiers and serial numbers, including license plates(13) Device identifiers and serial numbers(14) Web universal resource locators (URLs)(15) Internet protocol (IP) address(16) Biometric identifiers, including finger and voice prints(17) Full face photos, and comparable images(18) Any unique identifying number, characteristic or code and

  13. Following the HIPAA Regulation • Is it really a safe procedure to “de-identification” ? ( Yes or No ) • Are you sure that researchers can proceed their research after deleting these tags or codes ( Yes or No )

  14. 王 X 明 A 報紙

  15. 王小 x B 報紙

  16. X 小明 C 報紙

  17. 王 小 明 Re-Identification

  18. Example • To track those subjects of cervical cancer by comparing the ICD9 and SCC data ( Date, Tag and Result ) • Age and Location (Place) are very important influencing factors. Will this data-link-decoding spoil your research?

  19. Categories of variables in a data set • Directly Identifying Variables • Quasi-identifiers • Sensitive variables • Sensitive Variable : like the financial or health status of an individual. • How many sensitive variables are allowed in a limited database ?

  20. Direct Identifiers • Direct Identifiers are which can directly link to a subject personal data by public data information infrastructure. Name, Account Number, Medical Record Number, ID Number …..

  21. In-direct Identifier (Quasi) • Location (Address, Zip-Code) • Communication Identifier ( Telephone, FAX) • Internet Identifier ( IP, Email, Machine Code ) • Any unique identifying number, characteristic or code

  22. Quasi-Identifier • Date of Birth (DoB) • DoB – Month and Year • Day, Month and Year of Admission, Discharge or Operation • Gender • Initials • Address • City • Region • Postal Code

  23. The Difference • Anonymous • Confidential • De-identified The IRB often finds that the terms anonymous, confidential, and de-identifiedare used incorrectly. These terms are described below as they relate to an individual’s participation in the research and the way that their data are collected and maintained for analysis.

  24. Anonymous • It is impossible to know whether or not an individual participated in the study directly. • A study participant who is a member of a minority ethnic group might be identifiable from even a large data pool. • Information regarding other unique individual characteristics (indirect identifiers) might make it possible to identify an individual from a pool of dataset.

  25. Example A • Taiwan Health Insurance Claim Data Set for Physician Behavior of Prescription in Commercial Use (PBMs know which physician prescribed their medications)

  26. Confidential • The research team is obligated to protect the data from disclosure outside the research according to the terms of the research protocol and the informed consent document. • In order to protect against accidental disclosure, the subject’s name or other identifiers should be stored separately from their research data and replaced with a unique code to create a new identity for the subject. • Note that coded data are not anonymous.

  27. Example B • Use distrust or conflict mechanism between different individuals or branches • Congressmen and Officers • Accounting and Financial Branch • Market and Sale • IRB and Researcher

  28. De-identified • When any direct or indirect identifiers or codes linking the data to the individual subject’s identity are destroyed. Data have been de-identified. There were no risk to re-identify. However, in the research aspect, there were a lot of details and facts would be ignored and loosed.

  29. Limited Safe De-Identified Confidential Anonymous

  30. Re-Identification • Re-Link with some identifier or quasi-identifier to access original identification. • Evaluation the risk of re-identification is an attitude or consensus for a reviewer.

  31. Limited or De-Identified • Contract or not ? • (Non-Disclosure Agreement) • Regulation or not ? • Expiated or Full Board ? • Preservation or Time Period Available ? • Indefinite • With Date to be Expired • Database Access Committee ? • Database Administrator ?

  32. Heuristics A Perfect Data Security Management & Infrastructure IRB Role and Review FAQ

  33. Are subjects identifiable by their age, gender, and residence ? 原住民、少數族群、特殊疾病,能夠透過不同資料庫的比對,讓受試者或被研究者的個人資料重新再被連結。 某些研究需要年紀、性別和居住地的資料,年紀可以限制在一定的 Interval,如 10年、5年為一個單位,ZIPCode 要重新編碼

  34. Can a person be re-identified from their diagnosis code ? • Many data sets also include diagnosis codes (for example, ICD-10 codes). • Hospital medical record abstract data is almost publicly available. • A set of diagnosis codes can make an individual very unique. • Some of the records in the disclosed data set have diagnosis codes for rare and visible diseases/conditions

  35. Can a claim database be used for re-identification ? • A lot of literature makes the point that claim database can be used for re-identification. However, the accuracy of this statement will depend on your jurisdiction. • Other sources of public information they can still be very useful for re-identification.

  36. Can individuals be re-identified from disease maps ?

  37. Do these maps risk identifying any of the individuals ? • There are three questions that need to be answered to determine the risk: • Is the disease visible ? • Is the disease rare in the geography ? • If I re-identify an individual, will I learn something new about them ?

  38. Can postal codes re-identify individuals ? • 5 codes are the smallest geographic unit that is used by Taiwan post to deliver mail. In a health care context they are the most common geographic unit because that is what patients know and are able to provide. • The postal code is the only demographic information that is being disclosed in this data set. • The smallest postal codes in all provinces and territories have very few people living there. Any information about the postal code would pertain to a very small number of individuals.

  39. Definition of identifiable dataset if a person can find their record(s) in the dataset • Who is most sensitive to a data de-identification ? (Individual or reviewer) • Best de-identification of dataset is that a individual cannot point out his/her record.

  40. How can I de-identify longitudinal records ? • Time Series Record is just a DNA (unique)-sequential dataset. • It can easily re-identified. • It should be considered a limited database. • Intervals are less likely to be unique than actual dates.

  41. How can I safely release data to multiple researchers? • Re-numbering • Re-ranking • Different Sampling • Shuffle your data before disclosure • Strong dis-incentive to match the two data sets • Change (Say 0.4 to 40%, English style to metric)

  42. Is sampling sufficient to de-identify a data set ? • Not only statistical significance but also risk re-identification would be taken into consideration. • Intruder may not know their target within disclosure database • Sampling fraction if it is higher ? (Similar as public database)

  43. Is there a secondary use market for health information ? • Yes or No • Pharmaceutical Benefit Maker • Private Health Insurance Service • Other service ( Women and Children)

  44. Should de-identified data go through a research ethics review ? • In the first approach the IRB form has a checkbox question asking the investigator if the data is de-identified. (UM forms) • If the investigator checks that box then the IRB does not review the protocol and it is automatically approved. • The reasoning is that it is de-identified data and therefore there is no requirement to review the protocol.

  45. Should IRBs decide if a data set is de-identified ? • Yes or No ? (No) • We don’t have a privacy expert. • Whether a particular data set is identifiable, and resolving any re-identification risk concerns is iterative. • If these interactions are attempted they can be very slow and consequently frustrating.

  46. Should we de-identify if technology is moving so fast ? • Re-Identification technology moves faster than De-Identification • 道高一尺,魔高一丈 • Educations for data security is cheaper than new technology. • High technology stands for high risk

More Related