The price of privacy and the limits of lp decoding
Download
1 / 22

The Price of Privacy and the Limits of LP decoding - PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on

The Price of Privacy and the Limits of LP decoding. Kunal Talwar MSR SVC. [ Dwork, McSherry, Talwar, STOC 2007 ]. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A. Teaser. Compressed Sensing: If x 2 R N is k -sparse

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Price of Privacy and the Limits of LP decoding' - xanthe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The price of privacy and the limits of lp decoding

The Price of Privacy and the Limits of LP decoding

Kunal Talwar

MSR SVC

[Dwork, McSherry, Talwar, STOC 2007]

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAAAA


Teaser
Teaser

Compressed Sensing:

If x2RN is k-sparse

Take M ~Ck log N/k random Gaussian measurements

Then L1 minimization recovers x.

For what k does this make sense (i.e M < N)?

How small can C be?


Outline
Outline

  • Privacy motivation

  • Coding setting

  • Results

  • Proof Sketch


Setting
Setting

  • Database of information about individuals

    • E.g. Medical history, Census data, Customer info.

  • Need to guarantee confidentiality of individual entries

  • Want to make deductions about the database; learn large scale trends.

    • E.g. Learn that drug V increases likelihood of heart disease

    • Do not leak info about individual patients

Analyst

Curator


Dinur and nissim 2003
Dinur and Nissim [2003]

  • Simple Model (easily justifiable)

    • Database: n-bit binary vector x

    • Query: vector a

    • True answer: Dot product ax

    • Response is ax+e=True Answer + Noise

  • Blatant Non-Privacy: Attacker learnsn−o(n)bits of x.

  • Theorem:If all responses are within o(√n)of the true answer, then the algorithm is blatantly non-private even against a polynomial time adversary asking O(nlog2n)random questions.


Implications
Implications

Privacy has a Price

  • There is no safe way to avoid increasing the noise as the number of queries increases

    Applies to Non-Interactive Setting

  • Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to the DiNi attack.

    This work : what if most responses have small error, but some can be arbitrarily off?


Error correcting codes model
Error correcting codes: Model

  • Real vector x2Rn

  • Matrix A2Rmxnwith i.i.d. Gaussian entries

  • Transmit codeword Ax 2 Rm

  • Channel corrupts message. Receive y=Ax +e

  • Decoder must reconstruct x, assuming e has small support

    • small support: at most mentries of e are non-zero.

Encoder

Decoder

Channel


The decoding problem
The Decoding problem

min support(e')

such that

y=Ax'+e'

x'2Rn

solving this would give the original message x.

min |e'|1

such that

y=Ax'+e'

x'2Rn

this is a linear program; solvable in poly time.


Lp decoding works
LP decoding works

  • Theorem [Donoho/ Candes-Rudelson-Tao-Vershynin]

    For an error rate  < 1/2000, LP decoding succeeds in recovering x(for m=4n).

  • This talk: How large an error rate can LP decoding tolerate?


Results
Results

  • Let * = 0.2390318914495168038956510438285657…

  • Theorem 1: For any <*, there exists csuch that if Ahas i.i.d. Gaussian entries, and if

    • Ahas m = cnrows

    • For k=m, every support k vector eksatisfies|e–ek| < 

      then LP decoding reconstructs x’where |x’-x|2is O(∕ √n).

  • Theorem 2: For any >*, LP decoding can be made to fail, even if mgrows arbitrarily.


Results1
Results

  • In the privacy setting:

    Suppose, for <*, the curator

    • answers (1- ) fraction of questions within error o(√n)

    • answers fraction of the questions arbitrarily.

      Then the curator is blatantly non-private.

  • Theorem 3: Similar LP decoding results hold when the entries of A are randomly chosen from §1.

  • Attack works in non-interactive setting as well.

  • Also leads to error correcting codes over finite alphabets.


In compressed sensing lingo
In Compressed sensing lingo

  • Theorem 1: For any <*, there exists csuch that if Bhas i.i.d. Gaussian entries, and if

    • Bhas M = (1 – c) Nrows

    • For k=m, for any vector x2RN

      then given Ax, LP decoding reconstructs x’where


Rest of talk
Rest of Talk

  • Let * = 0.2390318914495168038956510438285657…

  • Theorem 1 (=0): For any <*, there exists c such that if Ahas i.i.d. Gaussian entries with m=cn rows, and if the error vector e has support at most m, then LP decoding accurately reconstructs x.

  • Proof sketch…


Scale and translation invariance
Scale and translation invariance

  • LP decoding is scale and translation invariant

  • Thus, without loss of generality,

    transmit x = 0

  • Thus receive y = Ax+e = e

  • If reconstruct z  0,

    then |z|2= 1

  • Call such a zbad for A.

Ax’

y

Ax


Proof outline
Proof Outline

Proof:

  • Any fixed z is very unlikely to be bad for A:

    Pr[z bad] · exp(-cm)

  • Net argument to extend to Rn:

    Pr[9 bad z] · exp(-c’m)

    Thus, with high probability, A is such that LP decoding never fails.


Suppose z is bad
Suppose z is bad…

  • z bad:

    |Az – e|1 < |A0 – e|1

    ) |Az – e|1 < |e|1

  • Let e have support T.

  • Without loss of generality,

    e|T =Az|T

  • Thus z bad:

    |Az|Tc < |Az|T

    )|Az|T > ½|Az|1

0

0

0

.

.

.

.

0

e1

e2

e3

.

.

.

.

em

a1z

a2z

a3z

.

.

.

.

amz

T

0

Tc

0

y=e

Az


Suppose z is bad1
Suppose z is bad…

Ai.i.d. Gaussian ) Each entry of Azis an i.i.d. Gaussian

Let W = Az; its entries W1,…Wmare i.i.d. Gaussians

z bad )i2 T |Wi| > ½i |Wi|

Recall: |T| ·m

Define S(W)to be sum of magnitudes of the top  fraction of entries of W

Thus zbad )S(W) > ½ S1(W)

Few Gaussians with a lot of mass!

T

0


Defining
Defining*

  • Let us look at E[S]

  • Let w*be such that

  • Let * = Pr[|W| ¸w*]

  • Then E[S*] = ½ E[S1]

  • Moreover, for any <*, E[S]·(½ – ) E[S1]

w*

E[S*]

=½ E[S1]

E[S]


Concentration of measure
Concentration of measure

  • Sdepends on many independent Gaussians.

  • Gaussian Isoperimetric inequality implies:

    With high probability, S(W)close toE[S].

    S1similarly concentrated.

  • Thus Pr[z is bad] · exp(-cm)

E[S*]

=½ E[S1]

E[S]


Beyond
Beyond *

Now E[S] > ( ½ + ) E[S1]

Similar measure concentration argument shows that any zis bad with high probability.

Thus LP decoding fails w.h.p. beyond *

Donoho/CRTV experiments used random error model.

E[S*]

=½ E[S1]

E[S]


Teaser1
Teaser

Compressed Sensing:

If x2RN is k-sparse

Take M ~Ck log N/k random Gaussian measurements

Then L1 minimization recovers x.

For what k does this make sense (i.e M < N)?

How small can C be?

k< *N ≈ 0.239 N

C > (* log 1/ *)–1 ≈ 2.02


Summary
Summary

  • Tight threshold for Gaussian LP decoding

  • To preserve privacy: lots of error in lots of answers.

  • Similar results hold for +1/-1 queries.

  • Inefficient attacks can go much further:

    • Correct (½-)fraction of wild errors.

    • Correct (1-) fraction of wild errors in the list decoding sense.

  • Efficient Versions of these attacks?

    • Dwork-Yekhanin: (½-)using AG codes.


ad