private keyword search on streaming data l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Private Keyword Search on Streaming Data PowerPoint Presentation
Download Presentation
Private Keyword Search on Streaming Data

Loading in 2 Seconds...

play fullscreen
1 / 27
channer

Private Keyword Search on Streaming Data - PowerPoint PPT Presentation

113 Views
Download Presentation
Private Keyword Search on Streaming Data
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

  2. Motivating Example • The intelligence community collects data from multiple sources that might potentially be “useful” for future analysis. • Network traffic • Chat rooms • Web sites, etc… • However, what is “useful” is often classified.

  3. Current Practice • Continuously transfer all data to a secure environment. • After data is transferred, filter in the classified environment, keep only small fraction of documents.

  4. Filter Storage Classified Environment ¢¢¢! D(1,3)! D(1,2)! D(1,1)! D(3,1) D(1,1) D(1,2) D(2,2) D(2,3) D(3,2) D(2,1) D(1,3) D(3,3) ¢¢¢! D(2,3)! D(2,2)! D(2,1)! Filter rules are written by an analyst and are classified! ¢¢¢! D(3,3)! D(3,2)! D(3,1)!

  5. Current Practice • Drawbacks: • Communication • Processing

  6. How to improve performance? • Distribute work to many locations on a network • Seemingly ideal solution, but… • Major problem: • Not clear how to maintain privacy, which is the focus of this talk

  7. Storage E(D(1,2)) E(D(1,3)) Filter ¢¢¢! D(1,3)! D(1,2)! D(1,1)! Classified Environment Decrypt Storage E(D(2,2)) Filter ¢¢¢! D(2,3)! D(2,2)! D(2,1)! Storage D(1,2) D(1,3) D(2,2) Storage Filter ¢¢¢! D(3,3)! D(3,2)! D(3,1)!

  8. Example Filter: • Look for all documents that contain special classified keywords, selected by an analyst • Perhaps an alias of a dangerous criminal • Privacy • Must hide what words are used to create the filter • Output must be encrypted

  9. More generally: • We define the notion of Public Key Program Obfuscation • Encrypted version of a program • Performs same functionality as un-obfuscated program, but: • Produces encrypted output • Impossible to reverse engineer • A little more formally:

  10. Public Key Program Obfuscation

  11. Privacy

  12. Related Notions • PIR (Private Information Retrieval) [CGKS],[KO],[CMS]… • Keyword PIR [KO],[CGN],[FIPR] • Program Obfuscation [BGIRSVY]… • Here output is identical to un-obfuscated program, but in our case it is encrypted. • Public Key Program Obfuscation • A more general notion than PIR, with lots of applications

  13. What we want Filter Storage ¢¢¢! D(1,3)! D(1,2)! D(1,1)!

  14. This is matching document #1 This is a Non-matching document This is a Non-matching document This is matching document #2 This is a Non-matching document This is matching document #3

  15. How to accomplish this?

  16. Several Solutions based on Homomorphic Encryptions • For this talk: Paillier Encryption • Properties: • Plaintext set = Zn • Ciphertext set = Z*n2 • Homomorphic, i.e., E(x)E(y) = E(x+y)

  17. Simplifying Assumptions for this Talk • All keywords come from some poly-size dictionary • Truncate documents beyond a certain length

  18. D Dictionary . . . (g,gD) ¤= ¤= ¤= Output Buffer

  19. Here’s another matching document • Collisions cause two problems: • Good documents are destroyed • 2. Non-existent documents could be fabricated This is matching document #2 This is matching document #1 This is matching document#3

  20. We’ll make use of two combinatorial lemmas…

  21. How to detect collisions? • Append a highly structured, (yet random) k-bit string to the message • The sum of two or more such strings will be another such string with negligible probability in k • Specifically, partition k bits into triples of bits, and set exactly one bit from each triple to 1

  22. 100|001|100|010|010|100|001|010|010 010|001|010|001|100|001|100|001|010 010|100|100|100|010|001|010|001|010 = 100|100|010|111|100|100|111|010|010

  23. Detecting Overflow > m • Double buffer size from m to 2m • If m < #documents < 2m, output “overflow” • If #documents > 2m, then expected number of collisions is large, thus output “overflow” in this case as well. • Not yet in eprint version, will appear soon, as well as some other extensions.

  24. More from the paper that we don’t have time to discuss… • Reducing program size below dictionary size (using  – Hiding from [CMS]) • Queries containing AND (using [BGN] machinery) • Eliminating negligible error (using perfect hashing) • Scheme based on arbitrary homomorphic encryption

  25. Conclusions • Private searching on streaming data • Public key program obfuscation, more general than PIR • Practical, efficient protocols • Many open problems

  26. Thanks For Listening!