How to compile searching software so that it is impossible to reverse engineer
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

How to compile searching software so that it is impossible to reverse-engineer. PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data). Rafail Ostrovsky William Skeith UCLA. (patent pending). Airport 2 passenger list. Airport 3 passenger list. Airport 1 passenger list.

Download Presentation

How to compile searching software so that it is impossible to reverse-engineer.

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


How to compile searching software so that it is impossible to reverse engineer

How to compile searching software so that it is impossible to reverse-engineer.

(Private Keyword Search on Streaming Data)

Rafail Ostrovsky William Skeith

UCLA

(patent pending)


Motivation problem 1

Airport 2

passenger list

Airport 3

passenger list

Airport 1 passenger list

Mobile code

(with state)

Mobile code

(with state)

MOTIVATION: Problem 1.

  • Each hour, we wish to find if any of hundreds of passenger lists has a name from “Possible Terrorists” list and if so his/hers itinerary.

  • “Possible Terrorists” list is classified and should not be revealed to airports

  • Tantalizing question:can the airports help (and do all the search work) if they are not allowed to get “possible terrorist” list?

PROBLEM 1: Is it possible to design mobile software that can be transmitted to all airports (including potentially revealing this software to the adversary due to leaks) so that this software collects ONLY information needed and without revealing what it is collecting at each node?

Non-triviality requirement: must send back only needed information, not everything!


Motivation problem 2

MOTIVATION: Problem 2.

  • Looking for malicious insiders and/or terrorists communication:

    • (I) First, we must identify some “signature” criteria (rules) for suspicious behavior – typically, this is done by analysts.

    • (II) Second, we must detectwhichnodes/stations transmit these signatures.

  • Here, we want to tackle part (II).

Public

networks

PROBLEM 2: Is it possible to design software that can capture all messages (and network locations) that include secret/classified set of “rules”? Key challenge: the software must not reveal secret “rules”.

Non-triviality requirement: the software must send back only locations and messages that match given “rules”, not everything it sees.


What we want

Search software, that has a set of “rules” to choose which documents and/or packets to keep and which to toss.

Small storage (that collects selected documents and/or packets)

Various data streams, consisting of flows of documents/packets

documents/packets that match secret “rules”

Our “compiler” outputs straight line executable code (with program state) and a decryption key “D”.

STRAIGHT LINE EXECUTABLE CODE THAT DOES NOT REVEAL SEARCH “RULES”

Small Fixed-size Program State

(encrypted in a special way that our code modifies for each document processed)

Various data streams, consisting of flows of documents/packets

Decrypt using D

What we want

Punch line:

we can send executable

code publicly.

(it won’t reveal its secrets!)


Current practice

Current Practice

  • Continuously transfer all data to a secure environment.

  • After data is transferred, filter in the classified environment, keep only small fraction of documents.


How to compile searching software so that it is impossible to reverse engineer

Current practice:

Filter

Storage

Classified Environment

 D(1,3)D(1,2) D(1,1)

D(2,2)

D(3,1)

D(1,1)

D(1,2)

D(3,2)

D(2,1)

D(1,3)

D(3,3)

D(2,3)

D(2,3)D(2,2)D(2,1)

Filter rules are written by an analyst and are classified!

 D(3,3) D(3,2)D(3,1)

Amount of data that must be transferred to a classified environment is enormous!


Current practice1

Current Practice

  • Drawbacks:

    • Communication

    • Processing

    • Cost and timeliness


How to improve performance

How to improve performance?

  • Distribute work to many locations on a network, where you decide “on the fly” which data is useful

  • Seemingly ideal solution, but…

  • Major problem:

    • Not clear how to maintain security, which is the focus of this technology.


How to compile searching software so that it is impossible to reverse engineer

Open network

Storage

E(D(1,2))

E(D(1,3))

Filter

… D(1,3) D(1,2)D(1,1)

Classified Environment

Decrypt

Storage

E(D(2,2))

Filter

… D(2,3)D(2,2)D(2,1)

Storage

D(1,2)

D(1,3)

D(2,2)

Storage

Filter

… D(3,3)D(3,2)D(3,1)


How to compile searching software so that it is impossible to reverse engineer

  • Example Filters:

    • Look for all documents that contain special classified keywords (or string or data-item and/or do not contain some other data), selected by an analyst.

  • Privacy

    • Must hide what rules are used to create the filter

    • Output must be encrypted


More generally

More generally:

  • We define the notion of Public Key Program Obfuscation

  • Encrypted version of a program

    • Performs same functionality as un-obfuscated program, but:

    • Produces encrypted output

    • Impossible to reverse engineer

    • A little more formally:


Public key program obfuscation

Public Key Program Obfuscation

  • Can compile any code into a “obfuscated code with small storage”.

  • Think of the Compiler as a mapping:

    • Source code  “Smart Public-Key Encryption” with initial Encrypted Storage + Decryption Key.

  • Non-triviality: Sizes of complied program & encrypted storage & encrypted output are not much bigger, compared to uncomplied code.

  • Nothing about the program is revealed, given compiled code + storage.

  • Yet, Someone who has the decryption key get recover the “original” output.


Privacy

Privacy


Related notions

Related Notions

  • PIR (Private Information Retrieval) [CGKS],[KO],[CMS]…

  • Keyword PIR [KO],[CGN],[FIPR]

  • Cryptographic counters [KMO]

  • Program Obfuscation [BGIRSVY]…

    • Here output is identical to un-obfuscated program, but in our case it is encrypted.

  • Public Key Program Obfuscation:

    • A more general notion than PIR, with lots of applications


What do we want

What do we want?

Filter

Storage

E(D(1,2))

E(D(1,3))

… D(1,3)D(1,2)D(1,1)

2 requirements:

correctness: only matching documents are saved, nothing else.

efficiency: the decoding is proportional to the length of the buffer, not the size of the entire stream.

Conundrum: Complied Filter Code is not allowed to have ANY branches (i.e. any “if then else” executables). Only straight-line code is allowed!


Simplifying assumptions for this talk

Simplifying Assumptions for this Talk

  • All keywords come from some poly-size dictionary

  • Truncate documents beyond a certain length


Sneak peak the compiled code

Sneak peak: the compiled code

  • Suppose we are looking for all documents that contain some secret word from Webster dictionary.

  • Here is how it looks to the adversary: For each document, execute the same code as follows:


How to compile searching software so that it is impossible to reverse engineer

Lookup encryptions of all words appearing in the document and multiply them together.Take this value and apply a fixed formula to it to get value g.

D

Dictionary

.

.

.

g

Small Output Buffer


How should a solution look

How should a solution look?


How to compile searching software so that it is impossible to reverse engineer

This is matching document #1

This is a Non-matching document

This is a Non-matching document

This is matching document #2

This is a Non-matching document

This is matching document #3


How do we accomplish this

How do we accomplish this?


Reminder pke

Reminder: PKE

  • Key-generation(1k)  (PK, SK)

  • E(PK,m,r)  c

  • D(c, SK)  m

  • We will use PKE with additional properties.


Several solutions based on homomorphic public key encryptions

Several Solutions based on Homomorphic Public-Key Encryptions

  • For this talk: Paillier Encryption

  • Properties:

    • E(x) is probabilistic, in particular can encrypt a single bit in many different ways, s.t. any instances of E(0) and any instance of E(1) can not be distinguished.

    • Homomorphic: i.e., E(x)*E(y) = E(x+y)


Using paillier encryption

Using Paillier Encryption

  • E(x)E(y) = E(x+y)

  • Important to note:

    • E(0)c = E(0)*…*E(0) =

      = E(0+0+….+0) = E(0)

    • E(1)c = E(1)*…*E(1) =

      = E(1+1+…+1) = E(c)

  • Assume we can somehow compute an encrypted value v, where we don’t know what v stands for, but v=E(0) for “un-interesting” documents and v=E(1) for “interesting” documents.

  • What’s vc? It is either E(0) or E(C) where we don’t know which one it is.


How to compile searching software so that it is impossible to reverse engineer

g E(0) * E(1) * E(0)

D

g = E(0) if there are no matching words

g = E(c) if there are c matching words

Dictionary

gD= E(0) if there are no matching words

gD= E(c*D) if there are c matching words

Thus: if we keep g=E(c) and gD=E(c*D), we can calculate D exactly.

.

.

.

(g,gD)

Output Buffer


How to compile searching software so that it is impossible to reverse engineer

Here’s another matching document

  • Collisions cause two problems:

  • Good documents are destroyed

  • 2. Non-existent documents could be fabricated

This is matching document #1

This is matching document#3

This is matching document #2


How to compile searching software so that it is impossible to reverse engineer

  • We’ll make use of two combinatorial lemmas…


Combinatorial lemma 1

Combinatorial Lemma 1

  • Claim: color survival games succeeds with probability > 1-neg(g)


How to detect collisions

How to detect collisions?

  • Idea: append a highly structured, (yet random) short combinatorial object to the message with the property that if 2 or more of them “collide” the combinatorial property is destroyed.

  •  can always detect collisions!


How to compile searching software so that it is impossible to reverse engineer

100|001|100|010|010|100|001|010|010

010|001|010|001|100|001|100|001|010

010|100|100|100|010|001|010|001|010

=

100|100|010|111|100|100|111|010|010


Combinatorial lemma 2

Combinatorial Lemma 2

Claim: collisions are detected with

probability > 1 - exp(-k/3)


We do the same for all documents

We do the same for all documents!


How to compile searching software so that it is impossible to reverse engineer

For every document in the stream do the same: Lookup encryptions of all words appearing in the document and multiply them together (= g).

D

Dictionary

Compute gD and f(g)

.

.

.

multiply (g,gD,f(g))into grandomly chosen locations

(g,gD,f(g))

Small Output Buffer


Overflow how to always collect at least m items with arbitrary overflow of matching documents

Overflow: how to always collect at least mitems(with arbitrary overflow of matching documents)

  • Idea: create a logarithmic (in stream size) number of original buffers.

    • First buffer is processed for every stream item

    • Second buffer takes every item with probability ½

    • Third buffer takes every item with (independent) probability ¼

    • i’th buffer with probability 1/2i

  • Key point: If number of documents >M, at least one buffer will get O(M) matching documents!


Comparison of our work to bethencourt song waters 06

Comparison of our work to [Bethencourt, Song, Waters 06]

[OS-05]

  • Buffer size to store m items: O(m log m)

  • Efficiency: decoding time is proportional to the buffer size.

[BSW-06]

  • Buffer size to store m items: O(m)

  • Efficiency: decoding time is proportional to the length of the entire stream.


More from the paper that we don t have time to discuss

More from the paper that we don’t have time to discuss…

  • Reducing program size below dictionary size (using  – Hiding from [CMS])

  • Queries containing AND (using [BGN] machinery)

  • Eliminating negligible error (using perfect hashing)

  • Scheme based on arbitrary homomorphic encryption

  • Extending to words not from dictionary (with small error prob.)


Conclusions

Conclusions

  • We introduced Private searching on streaming data

  • More generally: Public key program obfuscation -- more general than PIR, or cryptographic counters

  • Practical, efficient protocols

  • Eat your cake and have it too: ensure that only “useful” documents are collected.

  • Many possible extensions and lots of open problems

    • THANK YOU!


  • Login