Private keyword search on streaming data l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Private Keyword Search on Streaming Data PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

Private Keyword Search on Streaming Data. Rafail Ostrovsky William Skeith UCLA. (patent pending). Motivating Example. The intelligence community collects data from multiple sources that might potentially be “useful” for future analysis. Network traffic Chat rooms

Download Presentation

Private Keyword Search on Streaming Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Private keyword search on streaming data l.jpg

Private Keyword Search on Streaming Data

Rafail Ostrovsky William Skeith

UCLA

(patent pending)


Motivating example l.jpg

Motivating Example

  • The intelligence community collects data from multiple sources that might potentially be “useful” for future analysis.

    • Network traffic

    • Chat rooms

    • Web sites, etc…

  • However, what is “useful” is often classified.


Current practice l.jpg

Current Practice

  • Continuously transfer all data to a secure environment.

  • After data is transferred, filter in the classified environment, keep only small fraction of documents.


Slide4 l.jpg

Filter

Storage

Classified Environment

¢¢¢! D(1,3)! D(1,2)! D(1,1)!

D(3,1)

D(1,1)

D(1,2)

D(2,2)

D(2,3)

D(3,2)

D(2,1)

D(1,3)

D(3,3)

¢¢¢! D(2,3)! D(2,2)! D(2,1)!

Filter rules are written by an analyst and are classified!

¢¢¢! D(3,3)! D(3,2)! D(3,1)!


Current practice5 l.jpg

Current Practice

  • Drawbacks:

    • Communication

    • Processing


How to improve performance l.jpg

How to improve performance?

  • Distribute work to many locations on a network

  • Seemingly ideal solution, but…

  • Major problem:

    • Not clear how to maintain privacy, which is the focus of this talk


Slide7 l.jpg

Storage

E(D(1,2))

E(D(1,3))

Filter

¢¢¢! D(1,3)! D(1,2)! D(1,1)!

Classified Environment

Decrypt

Storage

E(D(2,2))

Filter

¢¢¢! D(2,3)! D(2,2)! D(2,1)!

Storage

D(1,2)

D(1,3)

D(2,2)

Storage

Filter

¢¢¢! D(3,3)! D(3,2)! D(3,1)!


Slide8 l.jpg

  • Example Filter:

    • Look for all documents that contain special classified keywords, selected by an analyst

    • Perhaps an alias of a dangerous criminal

  • Privacy

    • Must hide what words are used to create the filter

    • Output must be encrypted


More generally l.jpg

More generally:

  • We define the notion of Public Key Program Obfuscation

  • Encrypted version of a program

    • Performs same functionality as un-obfuscated program, but:

    • Produces encrypted output

    • Impossible to reverse engineer

  • A little more formally:


Public key program obfuscation l.jpg

Public Key Program Obfuscation


Privacy l.jpg

Privacy


Related notions l.jpg

Related Notions

  • PIR (Private Information Retrieval) [CGKS],[KO],[CMS]…

  • Keyword PIR [KO],[CGN],[FIPR]

  • Program Obfuscation [BGIRSVY]…

    • Here output is identical to un-obfuscated program, but in our case it is encrypted.

  • Public Key Program Obfuscation

    • A more general notion than PIR, with lots of applications


What we want l.jpg

What we want

Filter

Storage

¢¢¢! D(1,3)! D(1,2)! D(1,1)!


Slide14 l.jpg

This is matching document #1

This is a Non-matching document

This is a Non-matching document

This is matching document #2

This is a Non-matching document

This is matching document #3


How to accomplish this l.jpg

How to accomplish this?


Several solutions based on homomorphic encryptions l.jpg

Several Solutions based on Homomorphic Encryptions

  • For this talk: Paillier Encryption

  • Properties:

    • Plaintext set = Zn

    • Ciphertext set = Z*n2

    • Homomorphic, i.e., E(x)E(y) = E(x+y)


Simplifying assumptions for this talk l.jpg

Simplifying Assumptions for this Talk

  • All keywords come from some poly-size dictionary

  • Truncate documents beyond a certain length


Slide18 l.jpg

D

Dictionary

.

.

.

(g,gD)

¤=

¤=

¤=

Output Buffer


Slide19 l.jpg

Here’s another matching document

  • Collisions cause two problems:

  • Good documents are destroyed

  • 2. Non-existent documents could be fabricated

This is matching document #2

This is matching document #1

This is matching document#3


Slide20 l.jpg

  • We’ll make use of two combinatorial lemmas…


How to detect collisions l.jpg

How to detect collisions?

  • Append a highly structured, (yet random) k-bit string to the message

  • The sum of two or more such strings will be another such string with negligible probability in k

  • Specifically, partition k bits into triples of bits, and set exactly one bit from each triple to 1


Slide23 l.jpg

100|001|100|010|010|100|001|010|010

010|001|010|001|100|001|100|001|010

010|100|100|100|010|001|010|001|010

=

100|100|010|111|100|100|111|010|010


Detecting overflow m l.jpg

Detecting Overflow > m

  • Double buffer size from m to 2m

  • If m < #documents < 2m, output “overflow”

  • If #documents > 2m, then expected number of collisions is large, thus output “overflow” in this case as well.

  • Not yet in eprint version, will appear soon, as well as some other extensions.


More from the paper that we don t have time to discuss l.jpg

More from the paper that we don’t have time to discuss…

  • Reducing program size below dictionary size (using  – Hiding from [CMS])

  • Queries containing AND (using [BGN] machinery)

  • Eliminating negligible error (using perfect hashing)

  • Scheme based on arbitrary homomorphic encryption


Conclusions l.jpg

Conclusions

  • Private searching on streaming data

  • Public key program obfuscation, more general than PIR

  • Practical, efficient protocols

  • Many open problems


Slide27 l.jpg

Thanks For Listening!


  • Login