1 / 63

StrangerDB -- Safe Data Management with Untrusted Servers

StrangerDB -- Safe Data Management with Untrusted Servers. Dennis Shasha ( shasha@cs.nyu.edu ) Joint work (past and present): David Mazieres and Radu Sion. Goals. Store private data in a public database: backup, concurrency control, and query processing

afram
Download Presentation

StrangerDB -- Safe Data Management with Untrusted Servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StrangerDB --Safe Data Managementwith Untrusted Servers Dennis Shasha (shasha@cs.nyu.edu) Joint work (past and present): David Mazieres and Radu Sion

  2. Goals • Store private data in a public database: backup, concurrency control, and query processing • Make unauthorized modifications evident (safety) • Protect data from being observed (privacy) • Force server to deliver a consistent picture to all honest users or be discovered (consistency/serializability).

  3. Methods • Signatures for tamper-evidenceClients/users share encryption for privacy among themselves • SUNDR-style [1] maintenance to detect inconsistent transaction orders. • Checksums for query efficiency. 1. "Building secure file systems out of Byzantine storage", David Mazieres and Dennis Shasha, Principles of Distributed Computing, 2002. pp. 108-117.

  4. Tamper Evidence (safety)-- first possibility • Every modifier signs a collision-resistant hash of the database after modification. • Note: Need public signature keys provided as certificates from a trusted certificate authority. • Simple model: one signature for whole database. Can achieve much finer granularity as we see later.

  5. Collision-Resistant Hash Setup:Inspired by Merkle Trees sgn_user(HASH (root), ptr) ENCRYPTED DATA BLOB 1 HASH1, ptrs ENCRYPTED DATA BLOB 2 HASH2, ptrs …. ENCRYPTED DATA BLOB n

  6. Privacy – for now • Every blob is encrypted using private key encryption. A blob can be data or an index page having pointers and hashes like the root. • Descending the Merkle tree requires decryption as one goes.

  7. Assumption: Clients are Good • Note that if a client/user is malicious, that client can download or modify any information to which it has decryption keys and has access. It could share this information with the server. • So, if a client is not trustworthy, we can guarantee nothing about data it can access. • So, assume for now clients do not modify data maliciously or reveal data to the server. • Server could be bad.

  8. Malicious servers and forks • If a user u accesses a database at time t, the user wants to be sure that the database is current as of time t. • A malicious server might give u a database state reflecting only some previous updates and v another state. • Users cannot prevent such “forking attacks” but would like to discover them quickly.

  9. Underlying Strategy to ensure inter-user consistency • Periodically, every pair of users exchange their ideas of global history. If one member of the pair has missed an update done by the other, the histories won’t be consistent. • For this to work in practice, we need some encoding of that global history.

  10. Strawman Implementation:Global Log of Operations • Imagine that we have a sequential log consisting of every transaction that ever hit the database and that this sequential log is signed by the last transaction. • Ex: order of transactions is T1 T2 T3 (done by users u1 u2 u3). Log after T3: • (((((T1) sgn_u1) T2) sgn_u2) T3) sgn_u3

  11. Ensuring Individual Consistency • Log is held by the untrusted server. • Every time a user appends to and signs the log, the user first checks that the log he/she previously signed is a prefix of log to be signed. • Ex: if u is about to commit transaction T’ and has previously committed T, then u makes sure that the log now contains as a prefix the log from the time T was committed.

  12. Individual Log Consistency Bob: “In my previous update, the log had (in left to right order): Talice1, Tbob1 Now it has Talice1, Tbob1, Tmary1. So my previous view of the log was a prefix of the current one. Ok, so I’ll append my new transaction: Talice1, Tbob1, Tmary1, Tbob2 Alice hears nothing of all this.

  13. Ensuring Global Consistencyby detecting forking attacks • Periodically, users exchange their views of the log of all transactions. Each user verifies: • The signatures of all the users • Whether one global history is a prefix of the other or not. • If there has been a forking attack and u1 has not seen a transaction of u2, then neither user’s log will be a prefix of the other.

  14. Global Log Consistency Bob: “Alice, here is the log of all transactions as I see it: Talice1, Tbob1, Tmary1, Tbob2 Alice: “That’s funny. Here is my log Talice1, Tbob1, Tjill1. It is not a prefix of yours because I have Jill’s transaction, but yours is not a prefix of mine because you have Mary’s transaction.” Bob: “Are all the signatures good?” Alice: “Absolutely. See for yourself. Server is being naughty again.”

  15. Semantic Objection • Server might fail to update the data but assert that the transactions are executed in the same global order. • Fix: associate with each transaction a collision-resistant hash of the state of the whole database (root of Merkle). Call these h1, h2, h3 • (((((T1 h1) sgn_u1) T2 h2) sgn_u2) T3 h3) sgn_u3 • Transactions verify global hash upon data access.

  16. Practical objection: verification time/space grows with transactions • In this log-based (strawman) implementation, each user must sign log of all transactions since beginning of history (last database dump) and must exchange them. • Alternative is to have each user update his/her version for every transaction (even read-only transactions).

  17. Version Structure Detail • Suppose user u creates the last version structure. Then u increments his/her version number (and no other) and signs the structure, which contains:sgn_u(hash_u, (u1, n1), (u2, n2), …)where (ui, ni) means: ui is at version niand hash_u is the hash of the root of the Merkle after u is done.

  18. Three incrementally related version structures Bob: sgn_Bob(hash_Bob, (Bob, 6), (Alice, 12), (Bill, 4)) Alice: sgn_Alice(hash_Alice, (Bob,6), (Alice,13), (Bill,4)) Bob: sgn_Bob(hash_Bob, (Bob,7), (Alice, 13), (Bill,4))

  19. Ordering Properties of version structures • Define a partial order on version structures:vs1 < vs2 if the users in vs1 are a subset of the users in vs2 and for every user u in vs1, the version of u in vs1 (denoted vs1[u]) is less than or equal to vs2[u] and for at least one user v, vs1[v] < vs2[v]. • We say vs1 is “incrementally less than” vs2 if there exists a u such that u signs vs2, vs2[u] = vs1[u] + 1 and for all v, if v != u then vs2[v] = vs1[v].

  20. Forking creates incomparable version structures Bob: sgn_Bob(hash_Bob), (Bob, 6), (Alice, 12), (Bill, 4)) Alice: sgn_Alice(hash_Alice, (Bob,6), (Alice,13), (Bill,4))Server forks and doesn’t show Bob this. Bob: sgn_Bob(hash_Bob, (Bob,7), (Alice, 12), (Bill,4)) Now, Bob and Alice’s last version structures are incomparable, i.e. unordered by <.

  21. Once Incompatible, No Turning Back • Once Alice produces:Alice: sgn_Alice(hash_Alice, (Bob,6), (Alice,13), (Bill,4)) • And Bob produces:Bob: sgn_Bob(hash_Bob, (Bob,7), (Alice, 12), (Bill,4)) • if server shows Bob any of Alice’s updates after Alice, 12, Bob will see incomparability. Symmetric for Alice.

  22. Net Effect: Fork Leaves Indelible Mark • If server forks Bob and Alice even once, then server can never show either the other’s updates. • Human analogy: to catch a thief, put indelible ink on dollar bills.

  23. Signing Verification Protocol • When a user u is ready to sign the version structure vs_u constructed as above, u asks for all version structures since u’s last signed version structure. • User u verifies that they are incrementally related to one another. • User u then enters its new version structure.

  24. Detecting forks:Global Version Exchange Protocol Bob: “Alice, from time to time, a global version exchange protocol begins. Let’s say an instance of the protocol starts at time t. Every user sends its latest version structure (a relatively small structure) preceding t to all other users. Sending is done without mediation by server.” Alice: “Then what?” Bob: “Each user checks that the version structures are ordered by <.” Alice: “What if some user does not send?” Bob: “No problem. Validate the ones that do send.”

  25. Key database assurance:Version Structures and Serializability • Serializability will be based on version structure order. • That is, transactions will serialize in the < order of version structures. • Obvious now that there is only one root of the Merkle, but could generalize to having several roots, each locked separately.

  26. Serializability Lemma • Lemma: If T1  T2 (conflict edge from T1 to T2), vs1 is the version structure signed by user u1 for T1, vs2 is the version structure signed by user u2 for T2, all version structures among some set of virtuous users U including u1 and u2 are ordered, then vs1 < vs2. • Proof Idea: Suppose user u1 issues T1 and user u2 issues T2. For any conflict, there is some data item x such that op1(x) precedes op2(x). Will be reflected in Merkle…

  27. Issue: Server could be framed Bob (good): sgn_Bob(hash(Bobdata), (Bob, 6), (Alice, 12), (Bill, 4)) Alice (good): sgn_Alice(hash(Alicedata), (Bob,6), (Alice,13), (Bill,4))Server shows this to Bob. No fork. Bob (bad): sgn_Bob(hash(Bobdata), (Bob,7), (Alice, 12), (Bill,4)) Bob pretends server has forked.Upon global exchange, server looks guilty.

  28. Server can avoid being framed • Server signs the version structures from users if it agrees they are legitimate. In the case of previous figure, server will refuse to sign Bob’s second version structure. • Bob and the server can present their evidence before security officer.

  29. Server Proves Innocence Bob (good): sgn_server(sgn_Bob(hash(Bobdata), (Bob, 6), (Alice, 12), (Bill, 4))) Alice (good): sgn_server( sgn_Alice(hash(Alicedata), (Bob,6), (Alice,13), (Bill,4)))Server shows this to Bob. No fork. Bob (bad): sgn_Bob(hash(Bobdata), (Bob,7), (Alice, 12), (Bill,4)) Server refuses to sign. Shows innocence.

  30. Getting More from the Server • At this point the server gives us blocks of data. This means the client in fact does all the database work. • Is there any way the server could actually execute basic sql queries but we could verify either at the same time or later that the server has not lied?

  31. New Privacy Setup I • A record is a sequence of cleartext field values plus an encrypted part that may encompass one or more fields and a nonce (the record’s “encrypted blob”). • Nonce ensures that if several rows share the same encrypted blob, the server won’t know. • Encryption remains private key encryption.

  32. Now Server Can Use Indexes • Consider R(A, B, C, D, E, F), where the first three fields are in clear text and last three constitute the encrypted blob of each record. • Selections on first three fields can use indexes, followed by decrypting and scanning last three fields.

  33. Sweeney Attack • Voter registration gives: town, gender, zip, birth date. Enough to identify Governor William Weld in Massachusetts. • Health records contain: zip code, birth date, gender, ethnicity and diagnosis. • Hence possible to infer health status of governor. • L. Sweeney, Uniqueness of Simple Demographics in the U.S. Population, LIDAPWP4. Carnegie Mellon University, Laboratory for International Data Privacy, Pittsburgh, PA: 2000

  34. Privacy Setup Revised:avoid associations • Sometimes, A, B, C individually may not be secret but their association is secret. In that case, could split R into three tables:Ra(A, encrypt(B,C,D,E,F, nonce))Rb(B, encrypt(A,C,D,E,F, nonce))Rc(C, encrypt(A,B,D,E,F, nonce)) • Any selection on multiple fields will use only one of Ra, Rb, Rc • Avoid “Sweeney attack”.

  35. New Privacy Guarantees • Reveal cleartext fields to server but no dangerous associations. • Never reveal encrypted blob because nonce hides duplicates. • Extra-database associations, e.g. the phone call a database user makes after accessing the database, still possible, but we don’t handle.

  36. New, More Efficient, Approach • Version structure still used to prevent forking attack. • Discard Merkle tree. • Introduce a checksum data structure for one or more attributes to detect tampering.

  37. Checksum data structurefor some attribute A • Partition the domain of cleartext field A. • Associate with each partition (e.g. A between v1 and v2) the rows having those A values (e.g. R_v1,v2 = {r | v1 <= r.A <= v2} • Compute the checksum of R_v1,v2 • Do this for every partition.

  38. Properties of the checksum • Checksum should be an associative and commutative function of the rows. • Example: Use a one-way hash function of each row and then the checksum is the product of the hash values modulo a large prime. • M. Bellare and D. Micciancio. A new paradigm for collision-free hashing: Incrementality at reduced cost. In Proceedings of EuroCrypt, 1997.

  39. Maintaining the checksum data structure • If a row r is inserted into some partition p having checksum value v mod m, then compute the one-way hash of r, H(r) and the new checksum value becomes (H(r) * v) mod m. • If r were deleted, then the new checksum value becomes (v/H(r)) mod m • Updates are treated as deletes followed by inserts.

  40. New Approach to Modification • A transaction that does one or more modifications simply records updates to each partition (e.g. add x to partition p1, delete y from p2, etc.). • The transaction signs all of the above and puts in an “operation log” stored on server along with version structures. • Sgn_u(insert/deletes, query text, version structure)

  41. Client Interaction with Operation Log Bob: Operation log will record my changes to checksums and the inserts and deletes I do. Associate a version structure entry with an op log entry. Alice: But do you write a new checksum value? Bob: No. I write changes only. New checksums can be calculated later.

  42. Query Processing • Consider a simple selection query: select * from R where A = 15 and B> 5 • Client sends that query to the server and the server returns some rows based on A. Client decrypts and selects based on B values. • Normally, client doesn’t check any further. • Vast improvement in performance – server is doing most of the database work.

  43. Client verifies a query immediately after server responds Client asks the server for the rows in R from the partition containing R.A = 15 and verifies: (i) client received the correct subset of those rows; (ii) those rows are consistent with the checksum for that partition. • Client may have to compute checksum from the changes to this partition’s checksum in the operation log. Client updates checksum in operation log.

  44. Verifying a Query – At the moment Client: Ok, server, you have just sent me a result, but I want to check up on you. So, please send me: (1) all rows in the partition including this value; (2) all of the updates to the checksum for this partition since the last recalculation of the checksum value Server: why all of them? Client: So I can verify, based on version structures,that you are sending me all the modifications. I will then recalculate the checksum for this partition.

  45. Verifying a Query Q-- well after the fact • Client obtains rows of the partition that contains the query response as of now. Verifies against the current checksum. • Client next obtains all modifications to the partition that have occurred since Q (uses version structures to verify that it has received all mods). • Client undoes those modifications and verifies against Q.

  46. Verifying a Query – After delay Client: Ok, server, five minutes ago, you answered a query Q. I want to check on you. So first I will verify that the partition containing that query is consistent with the current checksum. Server: So you’ll want the checksum for that partition, all the rows for that partition,just as before. Client: Right. Once I verify that current rows in partitionare correct, I will want you to send me all modificationsback to the time when you executed Q. I will compute thepartition as it was then and verify Q.

  47. Verifying a Dump Client: Ok, server, I want to do a dump of the database. So I will verify that every partition is consistent with the current checksum. Server: So you’ll want a lot of history. Client: That’s right. But it’s only once a day.

  48. What About Inserts/Deletes? • These will have to be batched to hide the associations among different fields of rows. • How to do this? For each table R having cleartext attributes A, B, C, we already have Ra, Rb, Rc to avoid Sweeny attack. • In addition, we will have a table Rbatch that will contain entirely encrypted blobs.

  49. Using Rbatch • Queries on say R.A = 14 will access (and decrypt) Rbatch as well as Ra. • When Rbatch becomes large, some client with time on his hands (“de-batcher-at-home”) will spread the encrypted blobs from Rbatch to Ra, Rb, Rc, scrambling the order of course.

  50. Benefits/Costs of Query Approach • Can use database engine as more than a block server but have to support an operation log and version structures. • Can check on the server at any time. • Checking and recalculation of checksums can be done off-line and altruistically (“checking-at-home”)

More Related