- 49 Views
- Uploaded on
- Presentation posted in: General

Privacy Preserving Learning of Decision Trees

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Privacy Preserving Learning of Decision Trees

Benny Pinkas

HP Labs

Joint work with Yehuda Lindell

(done while at the Weizmann Institute)

Cryptographic methods

perturbation

methods

overhead

This work…

inaccuracy

lack of

privacy

We’re experiencing

a lot of fraud lately…

Here too..

I can’t find a pattern

to recognize fraud in advance..

Neither can I..

- But, what about
- Patients’ privacy
- Business secrets

Maybe we should share information..

Have you heard of “Secure

function evaluation” ?

This is all “theory”.

It can’t be efficient.

Huge

P2

P1

Confidential databaseD1

Confidential databaseD2

Wish to “mine”D1D2 without revealing more info

- Examples:
- Medical databases protected by law
- Competing businesses
- Government agencies (privacy, “need to know”)

y

x

Input:

nothing

C(x,y) and nothing else

Output:

One Exp per log “OT”s [NP]

Secure Function Evaluation [Yao ‘86]

- F(x,y) – A public function.
- Represented as a Boolean circuit C(x,y).

- Implementation:
- Two passes
- O(|X|) “oblivious transfers”. O(|C|) communication.
- Pretty efficient for small circuits!

- An efficient sub-linear protocol for secure computation of a complex well-known data-mining alg (ID3), for “semi-honest” parties.

- A different approach offered by the data-mining community [AS’00]:
- Perturb each entry (add random noise).
- Analyze accuracy of using perturbed data as input to data mining algorithms.
- How much privacy?

ID3: Choose attribute A that

minimizes the conditional

entropy of the attribute class

history

[0,4] years

[4,9] years

> 10 years

Age > 30

Claim > $500

No

No

Yes

No

Yes

Yes

Yes

No

No

- Core of the problem:Comparing entropies while preserving privacy.(entropy = x logx)
- Privacy: for each party, all intermediate values are random.
- Efficiency: most computation done independently by parties.
- Basic task: compute x log x.
x = e.g. # of patients with (age > 30) and (fraud = yes)

- Computing x log x:
- x =x1+ x2known to P1 and P2 respectively (independently computed from databases).
- Might as well compute x lnx lnx.
- First run a protocol to compute random shares, y1+ y2= ln x

- ln x is Real. Crypto works over finite fields. Must do numerical analysis.

Cryptographic Tools

- Secure Function Evaluation (SFE) [Yao]
- Oblivious Polynomial Evaluation [NP]

Q( . )

x

Input:

Q(x) and nothing else

nothing

Output:

Implementation:

Two passes, O(degree) (or O( log|F|) ) exponentiations.

Use Taylor approximation for lnx

- x = x1 +x2 = 2 n (1+) -½< < ½
- lnx = ln(2 n (1+)) = ln 2 n + ln(1+)
ln 2 n + i=1..k(-1) i-1 i / i

= ln 2 n + T()

- T()is a polynomial of degree k. Error is exponentially small in k.
- We only know how to work over finite fields
- Work in F, where |F| sufficiently large.
- Compute c·lnx, where c compensates for fractions.

- Step 1 of the protocol – Find n,
- Apply Yao’s protocol to the following small circuit
- Input: x1andx2
- Output (random shares):
- randoma1 and a2 s.t. a1 + a2 = x-2 n = ·2 n
- randomb1 and b2 s.t. b1 + b2 = ln 2 n

- Operation: The protocol finds 2 n closest to x1+x2, computes 2 n = x1+x2- 2 n.
x =x1 +x2 = 2 n + 2 n

- Apply Yao’s protocol to the following small circuit

Step 2 of the protocol

- Compute random shares of T() (Taylor approx.)

- P1 chooses a randomw1 F and defines a polynomial Q(x), s.t. w1+Q(a2) = T() (recall a1 + a2 = ·2 n)
- Namely, Q(x) = T( (a1+x)/2 n) – w1.

- Run an oblivious poly evaluation in which P2 computes
- w2= Q(a2) = T() – w1.

- Now the parties have randomw1 and w2 s.t.
- w1 + w2 = T() ln(1+)
- (b1 + w1) + (b2+ w2) ln 2 n + ln(1+) = ln x

- Tool: Multiply(c1,c2)
- Input: c1, c2
- Output: d1, d2s.t. d1+d2 = c1 *c2
- How? OPE of Q(z) = c1*z -d1
- d2 = Q(c2) = c1 *c2 - d1

- Actual task: x lnx
- Input: x1 +x2 =x,c1 +c2 = ln x
- Output: x lnx = (x1 +x2 )*(c1 +c2)
- Run Multiply(x1 ,c2), Multiply (c1 ,x2)

- Each party computes a share of the entropy by summing shares of x lnx
- A small circuit finds the attribute giving the minimal conditional entropy
- The attribute is assigned to the node
- The databases are divided according to the value of this attribute

- lnx protocol:
- secure computation of a small circuit
- one oblivious polynomial evaluation

- ID3 for a database with:
- 1,000,000 transactions
- 15 attributes
- 10 values per attribute
- 4 class values
- Communication per node takes seconds (T1)
- Computation per node takes minutes (P3)

- Only two participants
- “Curious but honest” participants
- Approximating ln x gives an approximation of ID3
- The participants learn the decision tree, which reveals some information

- A cryptographic protocol where the bulk of the operations is done independently.
- Data mining
- Rigorous model for secure data-mining.
- Efficient, secure protocol for ID3.

- Cryptography
- Sub-linear complexity - secure computation for large data sets.
- An efficient protocol for a complex known algorithm.
- Secure computation of logarithms(real function - numerical analysis).