1 / 50

Replication is not n eeded : Single Database, Computationally-Private Information Retrieval

Replication is not n eeded : Single Database, Computationally-Private Information Retrieval. Mahmudur Rahman ( Ratul ). AOL search data scandal (2006). #4417749: clothes for age 60 60 single men best retirement city jarrett arnold jack t. arnold jaylene and jarrett arnold

larue
Download Presentation

Replication is not n eeded : Single Database, Computationally-Private Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Replication is not needed: Single Database, Computationally-Private Information Retrieval MahmudurRahman (Ratul)

  2. AOL search data scandal (2006) #4417749: • clothes for age 60 • 60 single men • best retirement city • jarrett arnold • jack t. arnold • jaylene and jarrett arnold • gwinnett county yellow pages • rescue of older dogs • movies for dogs • sinus infection Thelma Arnold 62-year-old widow Lilburn, Georgia

  3. Observation The owners of databases know a lot about the users! This poses a risk to users’ privacy. E.g. consider database with stock prices… Can we do something about it? Yes, we can: • trust them that they will protect our secrecy, or • use cryptography! problematic! Disclaimer: Not yet practical...

  4. How can crypto help? Note: this problem has nothing to do with secure communication! database D user U

  5. Our settings secure link database D user U A new primitive: Private Information Retrieval (PIR)

  6. Plan • Definition of PIR • Quadratic Residuosity Assumption(QRA) • Construction of a computational PIR • The Basic cPIR scheme • The Recursive cPIR scheme • Extensions & Generalizations • Open problems • Literature: • B. Chor, E. Kushilevitz, O. Goldreich and M. Sudan,PrivateInformation Retrieval, Journal of ACM, 1998 • E. Kushilevitz and R. OstrovskyReplication Is NOT Needed: SINGLE Database, • Computationally-Private Information Retrieval, FOCS 1997

  7. Question How to protect privacy of queries? database D user U wants to retrieve some data from D shouldn’t learn what U retrieved

  8. Let’s make things simple! ? databaseB: index i = 1,…,w the user should learn Bi each Bi є {0,1} (he may also learn otherBi’s)

  9. Trivial solution The database simply sends everything to the user!

  10. Non-triviality The previous solution has a drawback: the communication complexity is huge! Therefore we introduce the following requirement: “Non-triviality”: the number of bits communicated betweenUand D has to be smaller than w.

  11. correctness privacy (of the user) non-triviality Private Information Retrieval (PIR) polynomial time randomized interactive algorithms This property needs to be defined more formally! input: input: index i = 1,…,w • at the end the user learns Bi • the database does not learn i • the total communication is < w

  12. Reduce Communication Complexity(CC) • Various PIR schemes with significantly smaller CC than the obvious n-bit solution • Schemes for a constant number, k, of servers with CC O(n1/k) where k>1 • Then comes new notion of Computational PIR (cPIR) schemes!!!

  13. Computational PIR (cPIR) • Databases are restricted to perform only polynomial-time computations • The identity of i is only computationally hidden from the DBs. • Assumption was that the data is represented in several databases, that do not communicate with each other. For every 0<c<1 there is a cPIR scheme for k=2 databases with CC O(nc) based on the existence of one-way functions

  14. Main Theorem • For every c > 0 there exists a single database cPIR scheme with communication complexity O(nc) , assuming the hardness of deciding quadratic residuosity.

  15. Some properties of proposed Scheme • The data is stored in the database in its plain form, no need for processing between different users • The scheme is single-rounded query-answer protocol (common communication pattern in context of databases)

  16. Oblivious RAM model • When data is stored in a single database and is encrypted, with a key that is held by the user, then the model becomes the oblivious RAM model. • Differences between proposed cPIR scheme and oblivious RAM model: • Data stored in special encrypted form • Requires pre-processing initialization step • Requires the user to keep some state information • If several users, it requires them to coordinate their acts • Still the key must be hidden from the database This technique do not provide a solution to the single database cPIR problem

  17. Hardness assumptions? A great source of hard problems is the number theory. [KO97] – construct PIR based on the Quadratic Residuosity Assumption We describe it on the next slides.

  18. Quadratic Residues Def. x isquadratic residue modulomifthere exists a єZm* such that x = a2 mod m QR(m) := the set of all quadratic residues modulo m. QNR(m) := Zm* \ QR(n) Z13*: a x: QR(13): Observation: every quadratic residue modulo 13 has exactly2 square roots, and hence |QR(13)| = |Z13*| / 2.

  19. QRs modulo pq Z15*: a a2 QR(15): Observation: every quadratic residue modulo 15 has exactly4 square roots, and hence |QR(15)| = |Z15*| / 4.

  20. Algorithmic questions about QR • So the problem is considered hardest when N is a product of two distinct primes of equal length k/2 • The hard set,Hk = {NIN = p1.p2 where p1, p2 are k/2-bit primes} • Suppose n=pq • Is it easy to test membership in QR(n)? • Fact: if one knows p and q – yes! in O(|N|3) time • What if one doesn’t know p and q?

  21. Quadratic Residuosity Assumption (QRA) ? Note: Zn+is a group! n=pq, where p and q are large primes of k/2 bits a є Zn+ ↓ Zn*: QR(p) QNR(p) • Zn+: • all aє Zn*: such that • a mod p є QR(p) • iff • a mod qє QR(q) QR(q) QR(n) QNR(q) Quadratic Residuosity Assumption (QRA): Let p1 and p2 be distinct prime numbers such that | p1 | = | p2 | (large enough) and let N = p1 p2. If the factorization of N is not known, there is no efficient procedure for solving the Quadratic Residuosity Problem modulo N i.e determining if a number is a QR or a QNR is computationally hard if p1 and p2 are not given

  22. Useful Properties of QR problem The Quadratic Residuosity Problem has some useful properties: • QR ×QR = QR • QR ×QNR = QNR • Picking a random element in QR(n) is easy: pick r ϵ ZN* and compute r2 mod N (for doing it, we do not need to know the factorization of N). • Under QRA, the Quadratic Residuosity Problem modulo N is hard not only for some special x ϵ ZN*but it is hard on average. For this property this problem is random self-reducible.

  23. Single DB Computational PIR • Protocol for two players: U and DB • Limited to probabilistic polynomial-time computations • Each player is a interactive turingmachine with following type of tapes: input, random, work, output and communication • Communication complexity: CC(n,k) where n is length of data string x and security parameter 1kand |q(i)|+ |a(x, q(i) ) |≤CC(n, k) • Must satisfy both correctness and privacy

  24. Single DB Computational PIR(Cont.) The scheme proceeds as follows: • U performs a polynomial time computation and writes single message q(i) on write-only communication tape • DB reads its read-only communication tape, performs a polynomial time computation and writes a single message a(x,q(i)) on its write-only communication tape • U reads the answer sent by DB and then performs a polynomial time computation using DB answer and its work tape U outputs a single bit on its output tape(should be xi)

  25. Single DB Computational PIR(Cont.) • Communication complexity is CC(n,k) where n is length of data string x and security parameter 1kand |q(i)|+ |a(x, q(i) )| ≤ CC(n, k) • Must satisfy both correctness and privacy • Correctness: The output of U must be • Privacy: The DB cannot distinguish, with non-negligible probability,t he difference between any two distributions of requests q(i) and q(j)

  26. for every j = 1,...,wthe database sets Yj = { Xj2 ifBi = 0 Xjotherwise M First (wrong) idea i i ↓ Yi is a QR iffBj=0 M is a QR iffBj=0 the user checks if M is a QR SetM = Y1· Y2 · ... · Yw

  27. PIR from the previous slide: correctness√ security? To learn i the database would need to distinguish NQR from QR. √ Problems! • non-triviality? doesn’t hold! communication: user → database: |B|· |Zn| database→ user: |Zn| Call it: (|B|, 1) - PIR

  28. How to fix it? consider each row as a separate database Idea:Given: construct Suppose that |B|= v2 and present Bas av×v-matrix:

  29. execute v (v,1)- PIRsin parallel Idea that works v Looks even worse: communication: user → database: v2· |Zn| database→ user: v·|Zn| v The method Let j be the column where Biis. In every “row” the user asks for the jth element So, instead of sending v queries the user can send one! Observe: in this way the user learns all the elements in the jth column! j ↓ Bi

  30. Putting things together i kth row Bj=0 iff Mkis QR multiply elements in each row

  31. So we are done! PIR from the previous slide: • correctness√ • non-triviality:communication complexity = 2√|B|· |Zn| √ • security? The to learn i the database would need to distinguish NQR from QR. Formally: fromany adversary that breaks our schemewe can constructan algorithm that breaks QRA simulates:

  32. The Basic Scheme • Single DB cPIR scheme whose communication complexity O(n0.5+c) for any c>0 • DB x as a s*t matrix of bits, denoted M • User interested in retrieving privately the bit xi [(a,b) entry of M] • Steps: • U Picks at random a k-bit number N(multiplication of 2 k/2 bit primes) and sends N to DB • User sends chosen random t numbers(total t*k bits) to DB (uniform including QNR and QR)

  33. The Basic Scheme(Cont.) • 3. DB computes for every row r a number zrϵz*N as follows: • And then computes: • 4. DB sends z1………………….zs to the user (total of s.k bits) • 5. User only considers zawhere this number is QR iffMa,b=0and vice versa. U can efficiently check whether za is a QR and by this retrieve the bit Ma,b

  34. The Basic Scheme(Cont.) • correctness√ • non-triviality: • Total s+t+1 k-bit numbers • Assumption: s=t=√n and security parameter,k=nc • Socommunication complexity = (2√n+1)· k =O(n0.5+c) √ • privacy? Then to learn i the database would need to distinguish QNR from QR. Formally: fromany adversary that breaks our schemewe can constructan algorithm that breaks QRA simulates:

  35. (X1,…,Xv) (M1,…,Mv) Improvements database D user U the user is interested just in oneMi. Idea: apply cPIR recursively which ensures lower communication complexity although more complicated!!

  36. The Recursive Scheme • Consider and replace Step 4 of the basic scheme in following way: • Basic scheme S1 and recursive scheme SL • nL= Number of bits users have when executing SL • tL= Number of columns in the matrix to represent the string • 4*: U and DB execute k times the scheme SL-1 .In each execution, U gets 1 of the bits of zaout of a string of length nL-1=k. SL. After k executions, U has the number zathat U can construct as in step 5 of the basic scheme • Scope of further improvement??

  37. The Recursive Scheme(Cont.) • In first use of SL-1for retrieving za,U retrieves most significant bit of zaand so on.. • So in each of these executions the DB can be smaller in a factor of k • U and DB execute k times the scheme SL-1 • In d-th execution, U gets d-th most significant bit of zaout of a string of length nL-1=SL held by DB .After k executions, U has the number za that U can construct as in step 5 of the basic scheme

  38. The Recursive Scheme(Cont.) • correctness√ • non-triviality: • Communication complexity: n1/(L+1).(kL+L.k) • Assumption: L=O(√logn/logk) so CC=2O(√logn.logk) • If k= ncthen CC= nO(√c) and if k=logcn • then CC=2O(√logn.loglogn) • privacy? The calculation proceeds exactly as the proof for the basic scheme and leads to following theorem:

  39. Complexity of PIRs – overview of the results Communication: • “recursive” PIR of [KO97]: for every c: O(nc) • [Cachin, Micali, Stadler, 1999]: poly-logarithmic in n • [Lipmaa, 2005]: O(log2n) For practical analysis see: • [Sion, Carbunar] On the Computational Practicality of Private Information Retrieval. their conclusion: It is the time-complexity that matters. In real-life: it is still more practical to transmit the entire database.

  40. Extensions • Symmetric PIR (also protect privacy of the database). [Gertner, Ishai, Kushilevitz, Malkin. 1998] • Searching by key-words [Chor, Gilboa, Naor, 1997] • Public-key encryption with key-word search [Boneh, Di Crescenzo, Ostrovsky, Persiano]

  41. Additional Security to User • Experimentation Attack: • Replace the j-th patent/entry in its database with “garbage” • After executing the PIR scheme, see if the user “protests” • How to prevent? • Obtain a certificate • DB entries augmented with authentication readability • Communication-efficient certificate for the user from root of commitment tree (DB signs all the messages)

  42. Additional Security for the Database • Issue: The user does not get multiple entries for the price of one. • Solution: • Ensure zero-knowledge protocol • User’s query is well-formed (correct number of QRs and QNRs) • Communication Complexity: for arbitrary ϵ > 0 • which is proportional to the original query size plus the communication complexity of a zero-knowledge proof

  43. Membership Queries • Problem Statement: • The databases inserts n l-bit strings into a data structure, which supports search operations on strings. The user conducts an oblivious walk on the data structure until either the word w is found, or U is assured of the fact that w is not one of s1;:::;sn • Solution: Combination of • Perfect Hashing • PIR Schemes

  44. Conclusions • Result has been established • With the additional security for the database, the PIR model can be formulated as one-out-of-n Oblivious Transfer protocol, a generalization of the well-known 1-out-of-2 OT primitive, which plays important role in cryptographic design.

  45. Open problems: • Improve efficiency. • Construct new extensions. • How to replace the QR problem in the protocol by shortest lattice vector problem? • For current security parameter which is poly-logarithmic in n, one can achieve CC=2O(√logn.loglogn),can one achieve poly-logarithmic communication complexity? • (Solved in 1999 by Cachin, Silvio Micali and Stadler)

  46. Advances in Computational PIR • The current paper in 1997 by Kushilevitz and Ostrovsky • In 1999, Christian Cachin, Silvio Micali and Markus Stadler achieved poly-logarithmic CC (based on Phi-hiding assumption) • In 2004, HelgerLipmaa achieved log-squared communication complexity O(llogn+Klog2n), where l is the length of the strings and k is the security parameter • In 2005 Craig Gentry and ZulfikarRamzan achieved log-squared communication complexity which retrieves log-square (consecutive) bits of the database

  47. Advances in Computational PIR • In 2009, HelgerLipmaadesigned a computational PIR protocol with communication complexity O(llogn+Klog2n) and worst-case computation of O(n / log n) public-key operations. • Amortization techniques that retrieve non-consecutive bits have been considered by Yuval Ishai, EyalKushilevitz, RafailOstrovsky and AmitSahai.

  48. Advances in Information theoretic PIR Achieving information theoretic security requires the assumption that • there are multiple non-cooperating servers, each having a copy of the database. Without this assumption, any information-theoretically secure PIR protocol requires an amount of communication that is at least the size of the database n.

  49. Relation to other cryptographic primitives • One-way functions are necessary, but not known to be sufficient, for nontrivial (i.e., with sub-linear communication) single database computationally private information retrieval. • Oblivious transfer, also called symmetric PIR, is PIR with the additional restriction that the user may not learn any item other than the one she requested.

  50. Thank you! Questions?

More Related