1 / 31

Price Optimal Querying with Data APIs

Price Optimal Querying with Data APIs. Prasang Upadhyaya , Magdalena Balazinska , Dan Suciu VLDB 2017 Presenter: Shunit Agmon. Outline. Motivation New solution: Refunds Extensions and Optimizations Experimental Results. Motivational Example.

nickan
Download Presentation

Price Optimal Querying with Data APIs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Price Optimal Querying with Data APIs PrasangUpadhyaya, Magdalena Balazinska, Dan Suciu VLDB 2017 Presenter: Shunit Agmon

  2. Outline • Motivation • New solution: Refunds • Extensions and Optimizations • Experimental Results

  3. Motivational Example • Bob sells data about users’ check-ins to businesses. • API: (lat, long, r, time) -> list of users Alice makes an API call: 1 4 Some people visited area 3 2 3 Alice makes another call: Alice makes another call: Impossible to avoid overpaying!

  4. Objective • Define a new pricing method s.t. • Clients pay for data they buy • seller is happy • Clients don’t pay too much for the data • clients are happy, more clients come, seller is happy again • We will formalize this later.

  5. Problem Setting • Trusted seller, untrusted buyer • Alice runs an app that acquires data from Bob • Alice is charged separately for each output tuple • Database D with schemas of the form (tid, ver, ) • Pricing by full selection queries: • Assume a D has a single relation • SELECT * FROM D WHERE [condition] • All tuples have the same price (but the solution can be generalized)

  6. Existing Solutions • Count (no history) • Bob charges Alice for each tuple she gets • Alice will pay for some tuples more than once • Block/Stream • History • Bob tracks Alice’s purchases • Bob needs to store all the purchases, and buyer purchases can’t be anonymous. • Block/Stream

  7. Outline • Motivation • New solution: Refunds • Extensions and Optimizations • Experimental Results

  8. Solution: Refunds • Alice makes multiple API calls, as before • Bob computes refund coupons and returns them with the answers • If Alice notices a repeated purchase of the same tuple, she sends the coupons back to Bob • Bob refund Alice for the tuples she bought twice.

  9. Protocol: BasicRefunds Query Q Result Q(D), Refunds(Q,D): Refund please, ? ? ? Then give Alice her money back.

  10. Annotations • Let W be a sequence of messages from Alice to Bob • is the set of all tuples Alice purchased and their counts • is the amount Alice pays for Queries in W • is the amount Bob refunds to Alice after processing W • is the net payment by Alice with message sequence W

  11. Properties of a Refunds Protocol • Safety– A refund protocol is safe if Alice must pay at least once for each tuple she purchased. • Optimality – A refund protocol is optimal if there is a way to ask for refunds so that Alice never pays more than once for each tuple she purchased. • Is BasicRefunds optimal? Is it safe?

  12. BasicRefunds is Optimal • Proof by induction on the number of queries in a sequence of messages W. • Base case: with no queries, Alice pays nothing. • I.A: Assume that for n-1 queries there was an optimal sequence of queries and refund messages: • Step: Append to the n’thquery , and a sequence of refund messages, one for each tuple from that Alice has seen before.New Sequence: • Then show that :

  13. BasicRefunds is not Safe Query Q Result Q(D), Refunds(Q,D): Refund please, Refund please, Refund please, ? ? ? Then give Alice her money back.

  14. Protocol: MonotoneRefunds Query Q Result Q(D), Refunds(Q,D): BEGIN REFUND All refund requests for tuples in query : END REFUND • Only one refund request for each tuple • qid of the second coupon is the same as in BEGIN REFUND • And • Then give Alice her money back and update • Otherwise, reject all coupons between BEGIN, END. ? ? ?

  15. Monotone Refunds is Safe and Optimal • Optimality: as in BasicRefunds (same construction of refunds). • Safety: Stronger claim: for each tuple t, define: • k – number of queries by Alice that contain t • r – number of valid refund requests for tuple t • Then in MonotoneRefunds, at all times. • Proof by induction on the length of the sequence of messages (W).

  16. Monotone Refunds Safety Proof • Base case: A single query for tuple t was executed, so no valid refund messages can be constructed (query ids in the request must be different). • Given a message sequence of length such that t appears in k queries and r refund messages, and k, observe the n’th message: • If it’s a query, it can only increase k • If it’s a BEGIN/END REFUND message, or a refund message for another tuple, k and r stay the same • If it’s a valid refund message for t: • If then k(We’re OK) • Otherwise, k-r=1. The r refund messages had to use r+1 distinct query ids. Then is at least one more than the query id of the k’th query with t in it. Then it is not a valid refund message, in contradiction.

  17. Outline • Motivation • New solution: Refunds • Extensions and Optimizations • Experimental Results

  18. Extension: Multiple Buyers • Observation: the safety and optimality of the protocol relies on tuple ids being different iff the tuples are different • To enable multiple buyers, the tuple id will include the user id. • Coupons will look like: • Different users will be assigned different tuple ids for the same tuple, so a buyer can’t use another buyer’s coupon.

  19. Extension: Updates • Assuming an update of a tuple has the same price as a brand new tuple • Bob maintains a version number for each tuple, incremented when the tuple updates. • Coupons with version numbers will look like: • Storage overhead, but some systems already have them. (SDSS,SciDB)

  20. Optimization: Group Coupons • Computing one coupon per tuple causes a lot of refund messages (API calls) • Solution: Bob can compute a coupon for a group of tuples • A group coupon can only be used to ask for a refund on all the tuples in the group • In a refund round, each tuple can only be included in one coupon • A coupon now contains a group id and group version number: • Bob has to give Alice a way to check if a tuple belongs to a group. • Contains(tid, gid) {True, False} • How should Bob group the tuples?

  21. Tree Structured Group Coupons 1,0 1,3 1,2 2,0 0,0 0,1 1,1 0,2 0,3 2,1 0,4 0,5 0,6 0,7 3,0 h+1,n h,2n h,2n+1 • Leaves are tuple ids, padded to a power of 2 • Group id (h,n) represents the tuples group • Possibility: Larger fan outs

  22. Outline • Motivation • New solution: Refunds • Extensions and Optimizations • Experimental Results

  23. Experimental Evaluation Setup • Single server running PostgreSQL 9.4 over OS X 10.11.5 • 2.7 GHz Intel Core i7, 16 GB DDR3 RAM • Hashing - SHA1, pgcrypto module implementation • Client is on the same machine as the server

  24. Experimental Evaluation Setup • Data: one table (test) with two integer columns (tid, val) and rows • Tid is a primary key starting with • valis a permutation of where N=|test| • Queries: • pkey.simple: SELECT * FROM test WHERE tid>=l and tid<=u • other.simple: SELECT * FROM test WHERE val>=l and val<=u • join: SELECT * FROM test a, test b WHERE a.val= b.tid AND a.tid>=l AND a.tid<=u

  25. Cost Savings(for pkey.simple) Query Answer Cardinalities X1.4 Query Parameter Distribution 100 times cheaper 10 times cheaper

  26. Single Coupons vs. Group Coupons(pkey.simple) Refunds time Query, pricing and coupons time

  27. Time Overhead for pkey.simple(No refund requests)

  28. Time Overhead for other.simple(No refund requests)

  29. Time Overhead for join(No refund requests)

  30. Summary • The paper shows a method to support history-aware pricing for data APIs. • A buyer is only charged once for each data item she purchases. • The buyer is responsible to track the data items and ask for refunds. • The paper shows a concise and tamper-proof protocol that is both optimal and safe. • Experimental evaluation shows that the method has a reasonably low time overheadwhile enabling significant cost savings for clients.

  31. Questions?

More Related