1 / 47

QUAKE: Quadruple Key and Encryption

QUAKE: Quadruple Key and Encryption. Centers for Disease Control and Prevention Third Annual National Early Hearing Detection and Intervention Conference, Washington, DC, February, 2004. Background.

fay
Download Presentation

QUAKE: Quadruple Key and Encryption

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QUAKE: Quadruple Key and Encryption Centers for Disease Control and Prevention Third Annual National Early Hearing Detection and Intervention Conference, Washington, DC, February, 2004.

  2. Background • University of Maine research team involved in research in informatics and developmental epidemiology • Contact Information • Craig A. Mason: craig.mason@umit.maine.edu • Shihfen Tu: shihfen.tu@umit.maine.edu

  3. To Link or Not to Link… • Data linkage provides huge opportunity for public health research • Integrate large, complex, longitudinal datasets • Address questions impossible to do any other way • This impractical 10 or 15 years ago • Lead to fears of “Big brother” • Abuse of information • Has identifiable information be released by researchers? • Individual rights versus public good • At what point does the public right to health trump my right to privacy? (assuming either of these exist)

  4. Strategies for Addressing Concerns • Legislative • Procedural • Educational • Our focus: Technological • Review linkage strategies • Review encryption issues

  5. Deterministic Linkage • A series of common identifying fields are selected across two databases • Records are matched across databases based on these fields • Two records must have identical values across all of these fields in order to be linked • “John”, “Bartholomew”, “Szapoznick” • “Jon”, “Bartholomew”, “Szapoznick”

  6. Probabilistic Linkage • Two records do not have to match across all fields in order to be linked • For a possible pairing, a value is calculated that reflects the likelihood that the two records are (or are not) the same person • Based upon the frequencies of values and the quality of the data

  7. Factors Influencing Probabilistic Linkage • Reliability of data fields • Greater reliability results in increased odds of a correct match • If a field is pure noise, correct matches will be random • Frequency of field values • The more common the value in a field, the greater the odds that the records will be erroneously matched • E.g., a match based on the name Szapocznik is more likely to reflect a correct match than is a match on the name Smith • Number of matches • The greater the number of individuals in one database that also appear in the other database, the greater probability of linkage across databases. • If two databases have no individuals in common, the probability of a linkage across the databases must be zero

  8. Statistician’s Anonymous “I’m David, and I’m a bean-counter”

  9. Encryption • Ecretsay odecay • Information is coded so that true values are not obvious • Ancient field • Modern era focus on electronic transmission of sensitive data • Notice the little yellow padlock in the bottom corner of your browser when shopping on e-bay?

  10. Encryption Techniques • Asymmetric or public key • Different key for encryption and decryption • Encryption key is public • Decryption key is private • Decryption key cannot be derived from encryption key • Provide security of data transmission • Anyone can use the public key to code a message • Only I can decrypt it • Typically based on product of large primes

  11. Challenge of Factorization • Factors hard to find • But once you know one, the other is easy to find Public Key: 114,381,625,757,888,867,669,235,779,976,146, 612,010,218,296,721,242,362,562,561,842,935,706,935,245, 733,897,830,597,123,563,958,705,058,989,075,147,599,290, 026,879,543,541 Private Key Based on Factors: 3,490,529,510,847,650,949,147,849,619,903,898, 133, 417,764,638,493,387,843,990,820,577 and 32,769,132,993,266,709,549,961,988,190,834,461, 413,177,642,967,992,942,539,798,288,533

  12. Encryption Techniques • Symmetric key • Same key for encryption and decryption • Key is not made public • Secret key - One Key to Rule Them All • More secure than asymmetric key • Nothing suggesting a possible key is published • Asymmetric key must be 6 to 30 times longer than symmetric key for equivalent security • Useful if you know in advance exactly who will want to encrypt a message to you

  13. Encryption Techniques • Security often described in terms of bits • 128 bit encryption indicated 2128 possible keys • 3,402,823,669,209,384,634,633,746,074,300,000,000,000,000,000,000,000,000,000,000,000,000 • A lot of possibilities… • Widespread use of 1024 and 2048 bit encryption on the horizon • 128 bit symmetric = 2304 bit asymmetric (Cryptography, p.166)

  14. A Dirty Little Secret.. • These big numbers hide the fact that the security is only as good as the algorithm • Think reliability of DNA testing • Plaintext attack (and its variations) • If the only unique name in the data set is Szapocznik • And the only unique variation in the encrypted data set is “X*GFfF825d=“….. • The key can be resolved

  15. A Dirty Little Secret.. • Even without the key, you can determine my grade • Some computational or physical wall between decrypted and encrypted data

  16. One-to-One Encryption Craig • Identifiers are encrypted into a unique value 93812….2431 Encryption Key H3~f9(-d

  17. One-to-Many Encryption Craig • Identifiers are encrypted into one of multiple values • Lack of uniqueness increases challenge of decryption 93812….2431 Encryption Key 9Dj1D[d H3~f9(-d dfR1”d/G or or

  18. That’s nice, but how can this help with data linkage? • All right. But apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh water system, and public health… What have the Romans ever done for us? --- Reg, spokesman for the People’s Front of Judea Monty Python Life of Brian (and Martin White, UC Berkeley)

  19. The Politics of Linkage • Two data systems contain information on same individuals • Would like to link data for public health research Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  20. The Politics of Linkage • I may not want schools to know about health services I have received Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  21. The Politics of Linkage • What solution may allow data to be linked, yet prevent sources from seeing each other’s identifying data Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  22. Quake • QUAdruple Key and Encryption Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  23. Quake • Requires algorithms to be reversible • You can “undo” a process to come back to original value

  24. Quake • Requires algorithms to be commutative • You get the same answer even if you do the problem backwards

  25. Quake • Each provider selects their own unique encryption key that is used to encrypt identifiers prior to linkage 052385043…9471 757260024…2512 Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  26. Quake • Community members representing individuals in each dataset also select their own unique encryption keys 420504763….8372 850258434…3435 052385043…9471 757260024…2512 Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  27. Quake • The encryption keys for the community representatives and the providers are entered separately, and the combined keys are hidden from the users 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  28. Quake • These combined encryption keys are used to encrypt identifiers in each file prior to linkage 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  29. Quake • Symmetric key with 1:many encryption 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  30. Quake • The combined encryption keys are not stored so neither party can decrypt on their own 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  31. Illustration of Security • To see why, consider the following simple keys • Service provider key: 7 • Community representative key: 3 • Combined key: 3 x 7 = 21 • Simple message to encrypt, “A” • Simple encryption algorithm • Each letter has a value 1-26, repeating • “A”=1, “Z”=26, “A”=27… • Multiply that value by the encryption key in order to obtain the new value Rep Key: 3 Hidden Combined Key: 21 Provider Key: 7

  32. Illustration of Security • Once encrypted, “A” becomes “U” Rep Key: 3 Original Message: A Hidden Combined Key: 21 Provider Key: 7 Encrypted Message: U

  33. Illustration of Security • If the community representative applied their key to the encrypted message, they would see “G” • 21 ÷ 3 = 7 • “G” is the letter with value 7 Rep Key: 3 Encrypted Message: U Hidden Combined Key: 21 Provider Key: 7 De-Encrypted Message: G

  34. Illustration of Security • If the service provider applied their key to the encrypted message, they would see “C” • 21 ÷ 7 = 3 • “C” is the letter with value 3 Rep Key: 3 Encrypted Message: U Hidden Combined Key: 21 Service Provider Key: 7 De-Encrypted Message: C

  35. Illustration of Security Encrypted Message: U • Only by working together can the message be decrypted Rep Key: 3 Partially Decrypted Message: G Hidden Combined Key: 21 Service Provider Key: 7 Fully Decrypted Message: A

  36. Quake • Once each dataset encrypted, several possible methods for linking 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  37. Linking Encrypted Files • Simple approach • Bring both encrypted files together on independent, non-networked machine • Each of the four parties enters their own key • Respective files internally decrypted and linked • New, de-identified linked file containing fields of interest created • Record of identifiers and keys electronically or physically erased • DoD 5220.22-M protocol

  38. Linking Encrypted Files • Benefits • Flexible linkage strategies (partial names, etc.) • Easiest to perform • Once completed no identifiers to enable plaintext attack • Issues • Process of encryption/decryption can be computationally demanding • Potential record of encrypted data and all keys • Can be destroyed, but time consuming

  39. Variation of Quake • Each provider selects own unique encryption key used to encrypt identifiers prior to linkage Key: 052385043…9471 Key: 757260024…2512 Service Data: Craig A. Mason School Data: Craig A. Mason

  40. Variation • Identifiers in their file encrypted with a 1:1 symmetric key Key: 052385043…9471 Key: 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  41. Variation • Parties then switch encrypted files • If identifying fields in both files are all equal.. • May be prone to variations of a plaintext attack • Inclusion of additional records whose identifiers contain random noise can nearly eliminate this risk Key: 052385043…9471 Key: 757260024…2512 School Data: yy#K66…. Service Data: *Bj&!33t….

  42. Variation • Each party then applies their own key to the other parties already-encrypted file • Identifiers in each file will have the same value • Can not determine key used by other source Key: 052385043…9471 Key: 757260024…2512 School Data: Jf*72Coo…. Service Data: Jf*72Coo….

  43. Variation • If files brought together by one of the parties • They may be able to conduct a plaintext attack • May then be able to determine key used by other party • Both files linked by trusted third party Key: 052385043…9471 Key: 757260024…2512 School Data: Jf*72Coo…. Service Data: Jf*72Coo….

  44. Variation • Again, may bring in community representatives Key: 052385043…9471 Key: 757260024…2512 School Data: Jf*72Coo…. Service Data: Jf*72Coo…. Linked Data: Jf*72Coo, Services, Grades Final Linked Data: Services, Grades

  45. Variation • Link based upon the encrypted identifier fields • No need to decrypt files when linking • Apply deterministic and probabilistic algorithms to encrypted data • No machine ever sees all keys • Final file contains no identifiers and only a limited number of fields of interest

  46. Variation of Quake • Issues • Requires 1:1 encryption algorithm • Can be addressed, but adds level of complexity • Can not examine partial strings • Specific partial strings can be generated prior to encryption • Month of birth, day of birth • First letter of first name

  47. Advanced Linkage Protocols for Addressing Confidentiality Concerns • Encrypted Linkage Protocols • Unique encryption keys administered by each database administrator and community liaisons • No one at any time sees the other person’s identifiers • Person conducting the linkage never sees any identifiers • Resulting linked set includes no decrypted identifiers • Resulting file can not be decoded, expanded, or relinked without agreement and cooperation of all parties • The community participates in the process • Technology that creates confidentiality concerns may provide means for reducing those concerns

More Related