quake quadruple key and encryption n.
Skip this Video
Loading SlideShow in 5 Seconds..
QUAKE: Quadruple Key and Encryption PowerPoint Presentation
Download Presentation
QUAKE: Quadruple Key and Encryption

play fullscreen
1 / 47
Download Presentation

QUAKE: Quadruple Key and Encryption - PowerPoint PPT Presentation

fay
96 Views
Download Presentation

QUAKE: Quadruple Key and Encryption

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. QUAKE: Quadruple Key and Encryption Centers for Disease Control and Prevention Third Annual National Early Hearing Detection and Intervention Conference, Washington, DC, February, 2004.

  2. Background • University of Maine research team involved in research in informatics and developmental epidemiology • Contact Information • Craig A. Mason: craig.mason@umit.maine.edu • Shihfen Tu: shihfen.tu@umit.maine.edu

  3. To Link or Not to Link… • Data linkage provides huge opportunity for public health research • Integrate large, complex, longitudinal datasets • Address questions impossible to do any other way • This impractical 10 or 15 years ago • Lead to fears of “Big brother” • Abuse of information • Has identifiable information be released by researchers? • Individual rights versus public good • At what point does the public right to health trump my right to privacy? (assuming either of these exist)

  4. Strategies for Addressing Concerns • Legislative • Procedural • Educational • Our focus: Technological • Review linkage strategies • Review encryption issues

  5. Deterministic Linkage • A series of common identifying fields are selected across two databases • Records are matched across databases based on these fields • Two records must have identical values across all of these fields in order to be linked • “John”, “Bartholomew”, “Szapoznick” • “Jon”, “Bartholomew”, “Szapoznick”

  6. Probabilistic Linkage • Two records do not have to match across all fields in order to be linked • For a possible pairing, a value is calculated that reflects the likelihood that the two records are (or are not) the same person • Based upon the frequencies of values and the quality of the data

  7. Factors Influencing Probabilistic Linkage • Reliability of data fields • Greater reliability results in increased odds of a correct match • If a field is pure noise, correct matches will be random • Frequency of field values • The more common the value in a field, the greater the odds that the records will be erroneously matched • E.g., a match based on the name Szapocznik is more likely to reflect a correct match than is a match on the name Smith • Number of matches • The greater the number of individuals in one database that also appear in the other database, the greater probability of linkage across databases. • If two databases have no individuals in common, the probability of a linkage across the databases must be zero

  8. Statistician’s Anonymous “I’m David, and I’m a bean-counter”

  9. Encryption • Ecretsay odecay • Information is coded so that true values are not obvious • Ancient field • Modern era focus on electronic transmission of sensitive data • Notice the little yellow padlock in the bottom corner of your browser when shopping on e-bay?

  10. Encryption Techniques • Asymmetric or public key • Different key for encryption and decryption • Encryption key is public • Decryption key is private • Decryption key cannot be derived from encryption key • Provide security of data transmission • Anyone can use the public key to code a message • Only I can decrypt it • Typically based on product of large primes

  11. Challenge of Factorization • Factors hard to find • But once you know one, the other is easy to find Public Key: 114,381,625,757,888,867,669,235,779,976,146, 612,010,218,296,721,242,362,562,561,842,935,706,935,245, 733,897,830,597,123,563,958,705,058,989,075,147,599,290, 026,879,543,541 Private Key Based on Factors: 3,490,529,510,847,650,949,147,849,619,903,898, 133, 417,764,638,493,387,843,990,820,577 and 32,769,132,993,266,709,549,961,988,190,834,461, 413,177,642,967,992,942,539,798,288,533

  12. Encryption Techniques • Symmetric key • Same key for encryption and decryption • Key is not made public • Secret key - One Key to Rule Them All • More secure than asymmetric key • Nothing suggesting a possible key is published • Asymmetric key must be 6 to 30 times longer than symmetric key for equivalent security • Useful if you know in advance exactly who will want to encrypt a message to you

  13. Encryption Techniques • Security often described in terms of bits • 128 bit encryption indicated 2128 possible keys • 3,402,823,669,209,384,634,633,746,074,300,000,000,000,000,000,000,000,000,000,000,000,000 • A lot of possibilities… • Widespread use of 1024 and 2048 bit encryption on the horizon • 128 bit symmetric = 2304 bit asymmetric (Cryptography, p.166)

  14. A Dirty Little Secret.. • These big numbers hide the fact that the security is only as good as the algorithm • Think reliability of DNA testing • Plaintext attack (and its variations) • If the only unique name in the data set is Szapocznik • And the only unique variation in the encrypted data set is “X*GFfF825d=“….. • The key can be resolved

  15. A Dirty Little Secret.. • Even without the key, you can determine my grade • Some computational or physical wall between decrypted and encrypted data

  16. One-to-One Encryption Craig • Identifiers are encrypted into a unique value 93812….2431 Encryption Key H3~f9(-d

  17. One-to-Many Encryption Craig • Identifiers are encrypted into one of multiple values • Lack of uniqueness increases challenge of decryption 93812….2431 Encryption Key 9Dj1D[d H3~f9(-d dfR1”d/G or or

  18. That’s nice, but how can this help with data linkage? • All right. But apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh water system, and public health… What have the Romans ever done for us? --- Reg, spokesman for the People’s Front of Judea Monty Python Life of Brian (and Martin White, UC Berkeley)

  19. The Politics of Linkage • Two data systems contain information on same individuals • Would like to link data for public health research Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  20. The Politics of Linkage • I may not want schools to know about health services I have received Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  21. The Politics of Linkage • What solution may allow data to be linked, yet prevent sources from seeing each other’s identifying data Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  22. Quake • QUAdruple Key and Encryption Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  23. Quake • Requires algorithms to be reversible • You can “undo” a process to come back to original value

  24. Quake • Requires algorithms to be commutative • You get the same answer even if you do the problem backwards

  25. Quake • Each provider selects their own unique encryption key that is used to encrypt identifiers prior to linkage 052385043…9471 757260024…2512 Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  26. Quake • Community members representing individuals in each dataset also select their own unique encryption keys 420504763….8372 850258434…3435 052385043…9471 757260024…2512 Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  27. Quake • The encryption keys for the community representatives and the providers are entered separately, and the combined keys are hidden from the users 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: Craig A. Mason…. School Data: Craig A. Mason….

  28. Quake • These combined encryption keys are used to encrypt identifiers in each file prior to linkage 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  29. Quake • Symmetric key with 1:many encryption 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  30. Quake • The combined encryption keys are not stored so neither party can decrypt on their own 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  31. Illustration of Security • To see why, consider the following simple keys • Service provider key: 7 • Community representative key: 3 • Combined key: 3 x 7 = 21 • Simple message to encrypt, “A” • Simple encryption algorithm • Each letter has a value 1-26, repeating • “A”=1, “Z”=26, “A”=27… • Multiply that value by the encryption key in order to obtain the new value Rep Key: 3 Hidden Combined Key: 21 Provider Key: 7

  32. Illustration of Security • Once encrypted, “A” becomes “U” Rep Key: 3 Original Message: A Hidden Combined Key: 21 Provider Key: 7 Encrypted Message: U

  33. Illustration of Security • If the community representative applied their key to the encrypted message, they would see “G” • 21 ÷ 3 = 7 • “G” is the letter with value 7 Rep Key: 3 Encrypted Message: U Hidden Combined Key: 21 Provider Key: 7 De-Encrypted Message: G

  34. Illustration of Security • If the service provider applied their key to the encrypted message, they would see “C” • 21 ÷ 7 = 3 • “C” is the letter with value 3 Rep Key: 3 Encrypted Message: U Hidden Combined Key: 21 Service Provider Key: 7 De-Encrypted Message: C

  35. Illustration of Security Encrypted Message: U • Only by working together can the message be decrypted Rep Key: 3 Partially Decrypted Message: G Hidden Combined Key: 21 Service Provider Key: 7 Fully Decrypted Message: A

  36. Quake • Once each dataset encrypted, several possible methods for linking 420504763….8372 850258434…3435 Hidden Key: 342002330…2852 Hidden Key: 147742268…0042 052385043…9471 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  37. Linking Encrypted Files • Simple approach • Bring both encrypted files together on independent, non-networked machine • Each of the four parties enters their own key • Respective files internally decrypted and linked • New, de-identified linked file containing fields of interest created • Record of identifiers and keys electronically or physically erased • DoD 5220.22-M protocol

  38. Linking Encrypted Files • Benefits • Flexible linkage strategies (partial names, etc.) • Easiest to perform • Once completed no identifiers to enable plaintext attack • Issues • Process of encryption/decryption can be computationally demanding • Potential record of encrypted data and all keys • Can be destroyed, but time consuming

  39. Variation of Quake • Each provider selects own unique encryption key used to encrypt identifiers prior to linkage Key: 052385043…9471 Key: 757260024…2512 Service Data: Craig A. Mason School Data: Craig A. Mason

  40. Variation • Identifiers in their file encrypted with a 1:1 symmetric key Key: 052385043…9471 Key: 757260024…2512 Service Data: *Bj&!33t…. School Data: yy#K66….

  41. Variation • Parties then switch encrypted files • If identifying fields in both files are all equal.. • May be prone to variations of a plaintext attack • Inclusion of additional records whose identifiers contain random noise can nearly eliminate this risk Key: 052385043…9471 Key: 757260024…2512 School Data: yy#K66…. Service Data: *Bj&!33t….

  42. Variation • Each party then applies their own key to the other parties already-encrypted file • Identifiers in each file will have the same value • Can not determine key used by other source Key: 052385043…9471 Key: 757260024…2512 School Data: Jf*72Coo…. Service Data: Jf*72Coo….

  43. Variation • If files brought together by one of the parties • They may be able to conduct a plaintext attack • May then be able to determine key used by other party • Both files linked by trusted third party Key: 052385043…9471 Key: 757260024…2512 School Data: Jf*72Coo…. Service Data: Jf*72Coo….

  44. Variation • Again, may bring in community representatives Key: 052385043…9471 Key: 757260024…2512 School Data: Jf*72Coo…. Service Data: Jf*72Coo…. Linked Data: Jf*72Coo, Services, Grades Final Linked Data: Services, Grades

  45. Variation • Link based upon the encrypted identifier fields • No need to decrypt files when linking • Apply deterministic and probabilistic algorithms to encrypted data • No machine ever sees all keys • Final file contains no identifiers and only a limited number of fields of interest

  46. Variation of Quake • Issues • Requires 1:1 encryption algorithm • Can be addressed, but adds level of complexity • Can not examine partial strings • Specific partial strings can be generated prior to encryption • Month of birth, day of birth • First letter of first name

  47. Advanced Linkage Protocols for Addressing Confidentiality Concerns • Encrypted Linkage Protocols • Unique encryption keys administered by each database administrator and community liaisons • No one at any time sees the other person’s identifiers • Person conducting the linkage never sees any identifiers • Resulting linked set includes no decrypted identifiers • Resulting file can not be decoded, expanded, or relinked without agreement and cooperation of all parties • The community participates in the process • Technology that creates confidentiality concerns may provide means for reducing those concerns