Privacy Preserving Learning of Decision Trees
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Privacy Preserving Learning of Decision Trees PowerPoint PPT Presentation


  • 43 Views
  • Uploaded on
  • Presentation posted in: General

Privacy Preserving Learning of Decision Trees. Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute). Cryptographic methods. perturbation methods. Cryptographic methods vs. perturbation methods. overhead. This work…. inaccuracy. lack of privacy.

Download Presentation

Privacy Preserving Learning of Decision Trees

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Privacy preserving learning of decision trees

Privacy Preserving Learning of Decision Trees

Benny Pinkas

HP Labs

Joint work with Yehuda Lindell

(done while at the Weizmann Institute)


Cryptographic methods vs perturbation methods

Cryptographic methods

perturbation

methods

Cryptographic methods vs. perturbation methods

overhead

This work…

inaccuracy

lack of

privacy


A story

A story

We’re experiencing

a lot of fraud lately…

Here too..

I can’t find a pattern

to recognize fraud in advance..

Neither can I..

  • But, what about

    • Patients’ privacy

    • Business secrets

Maybe we should share information..

Have you heard of “Secure

function evaluation” ?

This is all “theory”.

It can’t be efficient.


Privacy preserving data mining

Huge

Privacy preserving data mining

P2

P1

Confidential databaseD1

Confidential databaseD2

Wish to “mine”D1D2 without revealing more info

  • Examples:

  • Medical databases protected by law

  • Competing businesses

  • Government agencies (privacy, “need to know”)


Privacy preserving learning of decision trees

y

x

Input:

nothing

C(x,y) and nothing else

Output:

One Exp per log “OT”s [NP]

Secure Function Evaluation [Yao ‘86]

  • F(x,y) – A public function.

  • Represented as a Boolean circuit C(x,y).

  • Implementation:

  • Two passes

  • O(|X|) “oblivious transfers”. O(|C|) communication.

  • Pretty efficient for small circuits!


Our contribution

Our Contribution

  • An efficient sub-linear protocol for secure computation of a complex well-known data-mining alg (ID3), for “semi-honest” parties.

  • A different approach offered by the data-mining community [AS’00]:

    • Perturb each entry (add random noise).

    • Analyze accuracy of using perturbed data as input to data mining algorithms.

    • How much privacy?


The classification problem

The classification problem


Classification using decision trees

ID3: Choose attribute A that

minimizes the conditional

entropy of the attribute class

history

[0,4] years

[4,9] years

> 10 years

Age > 30

Claim > $500

No

No

Yes

No

Yes

Yes

Yes

No

No

Classification using Decision Trees


Privacy preserving id3

Privacy Preserving ID3

  • Core of the problem:Comparing entropies while preserving privacy.(entropy = x logx)

  • Privacy: for each party, all intermediate values are random.

  • Efficiency: most computation done independently by parties.

  • Basic task: compute x log x.

    x = e.g. # of patients with (age > 30) and (fraud = yes)


Privacy preserving id31

Privacy Preserving ID3

  • Computing x log x:

    • x =x1+ x2known to P1 and P2 respectively (independently computed from databases).

    • Might as well compute x lnx lnx.

    • First run a protocol to compute random shares, y1+ y2= ln x

  • ln x is Real. Crypto works over finite fields. Must do numerical analysis.


Privacy preserving learning of decision trees

Cryptographic Tools

  • Secure Function Evaluation (SFE) [Yao]

  • Oblivious Polynomial Evaluation [NP]

Q( . )

x

Input:

Q(x) and nothing else

nothing

Output:

Implementation:

Two passes, O(degree) (or O( log|F|) ) exponentiations.


Computing random shares of ln x ln x 1 x 2

Computing random shares oflnx = ln(x1+x2)

Use Taylor approximation for lnx

  • x = x1 +x2 = 2 n (1+) -½< < ½

  • lnx = ln(2 n (1+)) = ln 2 n + ln(1+)

    ln 2 n +  i=1..k(-1) i-1 i / i

    = ln 2 n + T()

  • T()is a polynomial of degree k. Error is exponentially small in k.

  • We only know how to work over finite fields

    • Work in F, where |F| sufficiently large.

    • Compute c·lnx, where c compensates for fractions.


Ln x 1 x 2 protocol cont

ln(x1+x2) Protocol (Cont.)

  • Step 1 of the protocol – Find n, 

    • Apply Yao’s protocol to the following small circuit

      • Input: x1andx2

      • Output (random shares):

        • randoma1 and a2 s.t. a1 + a2 = x-2 n = ·2 n

        • randomb1 and b2 s.t. b1 + b2 = ln 2 n

      • Operation: The protocol finds 2 n closest to x1+x2, computes 2 n = x1+x2- 2 n.

        x =x1 +x2 = 2 n + 2 n


Ln x 1 x 2 protocol cont1

ln(x1+x2) Protocol (Cont)

Step 2 of the protocol

  • Compute random shares of T() (Taylor approx.)

  • P1 chooses a randomw1 F and defines a polynomial Q(x), s.t. w1+Q(a2) = T() (recall a1 + a2 = ·2 n)

  • Namely, Q(x) = T( (a1+x)/2 n) – w1.

  • Run an oblivious poly evaluation in which P2 computes

    • w2= Q(a2) = T() – w1.

  • Now the parties have randomw1 and w2 s.t.

    • w1 + w2 = T()  ln(1+)

    • (b1 + w1) + (b2+ w2)  ln 2 n + ln(1+) = ln x


Computing x ln x

Computing x lnx

  • Tool: Multiply(c1,c2)

    • Input: c1, c2

    • Output: d1, d2s.t. d1+d2 = c1 *c2

    • How? OPE of Q(z) = c1*z -d1

    • d2 = Q(c2) = c1 *c2 - d1

  • Actual task: x lnx

    • Input: x1 +x2 =x,c1 +c2 = ln x

    • Output: x lnx = (x1 +x2 )*(c1 +c2)

    • Run Multiply(x1 ,c2), Multiply (c1 ,x2)


The rest of the work

The rest of the work..

  • Each party computes a share of the entropy by summing shares of x lnx

  • A small circuit finds the attribute giving the minimal conditional entropy

  • The attribute is assigned to the node

  • The databases are divided according to the value of this attribute


Efficiency

Efficiency

  • lnx protocol:

    • secure computation of a small circuit

    • one oblivious polynomial evaluation

  • ID3 for a database with:

    • 1,000,000 transactions

    • 15 attributes

    • 10 values per attribute

    • 4 class values

    • Communication per node takes seconds (T1)

    • Computation per node takes minutes (P3)


Issues

Issues

  • Only two participants

  • “Curious but honest” participants

  • Approximating ln x gives an approximation of ID3

  • The participants learn the decision tree, which reveals some information


Contributions

Contributions

  • A cryptographic protocol where the bulk of the operations is done independently.

  • Data mining

    • Rigorous model for secure data-mining.

    • Efficient, secure protocol for ID3.

  • Cryptography

    • Sub-linear complexity - secure computation for large data sets.

    • An efficient protocol for a complex known algorithm.

    • Secure computation of logarithms(real function - numerical analysis).


  • Login