1 / 27

Toward Practical Query Pricing With QueryMarket

Toward Practical Query Pricing With QueryMarket. Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu. University of Washington SIGMOD 2013. Motivation. Data is increasingly sold and bought on the web Websites that sell data: Xignite (financial)

Download Presentation

Toward Practical Query Pricing With QueryMarket

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward Practical Query Pricing With QueryMarket ParaschosKoutris PrasangUpadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington SIGMOD 2013

  2. Motivation • Data is increasingly sold and bought on the web • Websites that sell data: • Xignite (financial) • Gnip (social) • Data marketplace services: • Windows Azure Marketplace • Infochimps • Factual • DataMarket

  3. A Pricing Scenario (1) English-German dictionaryT • PRICING SCHEMES • Sell the whole tableTfor a fixed price • Q: translate only the word “thanks” • The user pays for redundant information • Price per output tuple • Q: Does the word “thanks” translate to “Auto” ? • An empty result still carries information

  4. A Pricing Scenario (2) English-German dictionary T Word Frequency Stats UF Q1: Return all translations to German of top 10 words in the genre “music” Q2: Return all translations to German of top 20words in the genre “music” • Current systems do not sell queries that combine datasets • Queries issued by a user may have overlapping content

  5. How To Price Data English-German dictionary T p(σT.english=‘thanks’)=$0.1 p(σT.german=‘Auto’)=$0.5 p(σT.english=‘car’)=$0.1 p(σT.english=‘day’)=$0.1 … p(σT.english=‘road’)=$0.15 p(σT.english=‘cat’)=$0.05 • Price points • selectionqueries on single table • exhaust the possible values (ColA) of some attribute A • may select on values not in the active domain

  6. QueryMarket: Contributions • A formal pricing framework where: • sellers specify a set of price points as selection queries • buyers can purchase any query on the database • the system automatically computes the priceof the query • Support efficient computationof prices for a large class of SQL queries • Support the necessary functionality for a marketplace: • Pricing queries with overlapping informationcontent • Database updates • Revenue sharing among different sellers?

  7. Outline • The Pricing Framework • Computing the Price • Query History • Revenue Sharing

  8. The Pricing Framework [Koutris et al., PODS 2012] • The seller defines price points (view-price pairs): S = { (V1,p1), (V2,p2), … } • A buyer can buy any query Q • The system will compute priceDS(Q) Buyer Q(D) ? Seller priceDS(Q) Pricing System + Database D Price points

  9. Properties of Prices We say that the views V1,…, Vkdetermine Q if one can compute Q(D) from V1(D),…, Vk(D) without access to D • Arbitrage-free: Given D, priceD(Q) is arbitrage-free if for all views V1, …, Vkthat determine Q: • priceD(Q) ≤ priceD(V1) + … + priceD(Vk) • Discount-free: priceD(Q) must not offer additional discounts except for the explicit price points defined by the seller

  10. The Pricing Formula • Arbitrage-Price: • The price of the cheapest set of views from price points S that determine the query Q • unique+arbitrage-free +discount-free + agrees with price points ColA = { a1, a2, a3 } ColB= { b } Table S Table R Q(y) = R(x),S(x,y) price = $1 price = $2 price = $3 • {σ[R.A=a1], σ[S.B=b] } determines Q • cost = 1 + 3 = 4 • {σ[R.A=a1], σ[S.A=a1] } also determines Q • cost = 1 + 2 = 3 (cheapest possible) 10

  11. Outline • The Pricing Framework • Computing the Price • Query History • Revenue Sharing

  12. Computing The Price • The problem of computing the arbitrage price even for SELECT-PROJECT-JOIN queries is coNP-complete • For some queries, the price can be computed fast: • Selections, joins w/o projection • We describe pricing as an Integer Linear Program (ILP) and then use fast ILP solvers (e.g. GLPK, CPLEX) • Classes of queries supported: • Selections/Projections/Joins • Unions • User-Defined Functions (UDF) • Bundles of queries 12

  13. ILP Construction (1) ColA = { a1, a2, a3 } ColB= { b } Table S Table R price = $1 price = $2 price = $3 • Price the queryQ(x,y) = R(x), S(x,y) • Introduce a {0/1} variable x[attribute,value] for each price point: • x[R.A, a2], x[S.A, a1], x[S.B, b], … 13

  14. ILP Construction (2) ColA = { a1, a2, a3 } ColB= { b } Table S Table R Q(x,y) = R(x), S(x,y) • Minimize(independent of the query): • price = x[R.A,a1] + x[R.A,a2] + x[R.A,a3] +2x[S.A,a1] + • 2x[S.A,a2] + 2x[S.A,a3] +3x[S.B,b] • Constraints: • (a1,b) in Q: x[R.A,a1] ≥ 1 • x[S.A,a1] + x[S.B,b] ≥ 1 • (a2,b) not in Q: x[R.A,a2] ≥ 1 • (a3,b) not in Q: x[R.A,a3] + x[S.A,a3] + x[S.B,b] ≥ 1 14

  15. ILP Construction (3) ColA = { a1, a2, a3} ColB= { b} Table S Table R New variable for each tuple in Qfull • Projection: Q(y) = R(x), S(x,y) • Constraints: • (a1,b) in Qfull: x[R.A,a1] ≥ z1 • x[S.A,a1] + x[S.B,b] ≥ z1 • (a2,b) in Qfull: x[R.A,a2] ≥ z2 • x[S.A,a2] + x[S.B,b] ≥ z2 • (b) in Q : z1 + z2 ≥ 1 15

  16. QueryMarket System • Runs on top of any SQL database • Information stored in the database: • Price points are stored in the database in price tables • Keeping track of price tables with an index table • The dataset: • English-german translation: Ten,gr(w, w’) • English-french translation : Ten,fr(w, w’) • UDF to find hashtags : IsHashtag(w) • Word frequency stats : WF(w, genre, frequency, rank)

  17. Price Computation (1) • Small dataset where columns have size ~ 102 3-way join 2-way joins with projections 2-way joins w/o projections selections

  18. Price Computation (2) • Larger dataset where columns have size ~ 103 3-way join 2-way joins with projections 2-way joins w/o projections selections

  19. Outline • The Pricing Framework • Computing the Price • Query History • Revenue Sharing

  20. Query History • A user asks a sequence of queries over time of varying information overlapQ = Q1, Q2, …, Qk • Experiment with 30 selection/join queries Oblivious pricing: each query priced independently Bundle pricing: each query Qi priced p(Q1,…,Qi)- p(Q1,…,Qi-1) View pricing: when a query is purchased, the purchased views are free for later queries

  21. Query History (2)

  22. View Pricing • View Pricing is our proposed strategy: • Computationally efficient • Low storage overhead • Close to optimal (bundle) price • View Pricing can be used for dynamic databases: if view V is purchased at some point and then updated, the user pays only an update price

  23. Outline • The Pricing Framework • Computing the Price • Query History • Revenue Sharing

  24. Revenue Sharing • How is the revenuesharedbetween sellers if several datasets contribute to the answer? • What if the cheapest set of views to determine a query is not unique ? • Example: • Q(‘sigmod13’) = isHashtag(‘sigmod13’), isNoun(‘sigmod13’) • Seller 1 prices $1 per entry for isHashtag, so does seller 2 • If both isHashtag, isNoun are false and each costs $1, purchasing either of the entries answers Q

  25. Revenue Sharing: Solution • For a seller s, share(s, Q) is the maximum revenue of s over all minimum-cost set of price points that determine Q • share(s, Q) can be computed in our framework • Solution: split price(Q) among sellers proportionally to their shares • Example: • Both shares are $1 • The revenue of each seller will be $0.5, since their shares are equal

  26. Conclusions • QueryMarket: the first system that supports pricing a large class of SQL queries within a formal framework • We presented solutions to address the requirements of a real-world marketplace • Future work includes: • Scaling the price computation (bucketization) • Full SQL Support (aggregates, negation) • Query answering under limited budget

  27. Thank you !

More Related