Query-Based Data Pricing - PowerPoint PPT Presentation

query based data pricing n.
Skip this Video
Loading SlideShow in 5 Seconds..
Query-Based Data Pricing PowerPoint Presentation
Download Presentation
Query-Based Data Pricing

play fullscreen
1 / 27
Query-Based Data Pricing
Download Presentation
Download Presentation

Query-Based Data Pricing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Query-Based Data Pricing ParaschosKoutris PrasangUpadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

  2. Motivation • Data is increasingly sold and bought on the web • Websites that sell data: • AggData[www.aggdata.com] • Xignite (financial data) [www.xignite.com] • Gnip (social media) [www.gnip.com] • Data marketplace services: • Windows Azure Marketplace (100+ datasets) [datamarket.azure.com] • Infochimps (15,000 datasets) [www.infochimps.com] Query-based pricing customized for buyers

  3. Current Pricing (1) • A fixed price for the whole dataset or for a specific set of views • Example:CustomLists • USA Business Database for $399 • Email addresses for $299 • Businesses in WA for $199 • Limitations: • Restaurants in WA ? • Businesses in cities with population >100,000 ?

  4. Current Pricing (2) • API Subscriptions (Azure Marketplace, Infochimps) • Allow queries over the data • Pay by number of transactions (page of results)

  5. Issues With Pricing • Buyers today need to buy a superset of the data they are interested in • Sellers can’t easily anticipate all possible queries that buyers might ask • Solution: we need a more flexiblepricing scheme, parameterizedby queries

  6. Outline • The Pricing Framework • The Pricing Formula • The Complexity of Pricing • Dichotomy and Algorithms for Selections

  7. The Pricing Framework • The seller defines price points (view-price pairs): S = { (V1,p1), (V2,p2), … } • A buyer can buy anyquery Q • The system will compute priceDS(Q) Buyer Q(D) ? Seller priceDS(Q) Pricing System + Database D V1,p1 V2,p2 …

  8. Instance-Based Determinacy Definition. V = V1,…,Vkdetermine Q given D, denoted D ⊢ V ↠ Q, if: forall D’, if V(D) = V(D’), then Q(D) = Q(D’) Intuitively, “V1,…, Vk determine Q” means that Q(D) can be answered only from V1(D),…,Vk(D), without accessing the database instance D

  9. Arbitrage-Free • Axiom 1. • Given D, the pricing function priceD(Q) is arbitrage-free if for all views V1, …, Vk and query Qwhere D ⊢ V1, …, Vk↠ Q: • priceD(Q) ≤ priceD(V1) + … + priceD(Vk) Suppose V determines Q and priceD(Q) > priceD(V). Then, we can • buy V(D) for priceD(V) • compute Q(D) from V(D) • now we have answered Q at some price p<priceD(Q)

  10. Discount-Free Axiom 2. The pricing function priceD(Q) should not offer any other additional discounts except for the explicit price points defined by the seller. • The intuition is that the price points represent discounts that the seller offers relative to the price of the whole database • A pricing function is discount-free if it is maximal

  11. Example: Origami Database

  12. Example: Origami Database Database S Price points Get all dragon origami for $2 Get all red origami for $3 What is the price of the entire database? Q(x,y,z) :- S(x,y,z) Exhausts the active domain V1, V2, V3, V4determine Q: price(Q) ≤ $8W1, W2, W3determine Q: price(Q) ≤ $9 price(Q)=$8

  13. Example: Origami Database R T S p(σcolor)=$50 p(σshape)=$99 p(σshape)=$2 p(σcolor)=$5 What is the price of the full join? Q(x,y,z,u,v) :- R(x,u), S(x,y,z), T(y,v)

  14. Outline • The Pricing Framework • The Pricing Formula • The Complexity of Pricing • Dichotomy and Algorithms for Selections

  15. The Query Pricing Formula • Given: • Price points S = {(V1,p1),…,(Vk, pk)} • Database instance D • Query Q. • Compute: priceDS(Q) • Properties: (a) arbitrage-free, (b) discount-free, (c) priceDS(Vi)=pi • If it exists, we say that the price points are consistent • Method: • Consider all subsets of V ={V1,…,Vk} that determine Q • Let C be the subset with the minimum price, Σi pi, for Viin C • Define pD(Q) = Σi pi Theorem. The price points are consistentiffpD(Vi)=pi for any price point i=1,…,k (b) priceDS(Q) = pD(Q) is the uniquearbitrage-free, discount-free pricing function that agrees with the price points 15

  16. Discussion • If the result of Q1 is always a subset of Q2, should Q1 be priced less than Q2? No! Example: • V(x,y) :- Fortune500(x,y)Q(x,y) :- Fortune500(x,y), StrongBuyRec(x) • price(Q) >> price(V) • We ignore computation costs in our framework • Cost of computing query Q • Q(D)=f(V(D)), but f can be hard to compute

  17. Outline • The Pricing Framework • The Pricing Formula • The Complexity of Pricing • Dichotomy and Algorithms for Selections

  18. Determinacy Definition. [Instance-dependent] V determines Q given D, denoted as D ⊢ V ↠ Q, if: forall D’, if V(D’) = V(D), then Q(D) = Q(D’) [Nash, Segoufin, Vianu ‘07] Definition. [Instance-independent] V determines Q, denoted as V ↠ Q, if: forall D, D’, if V(D) = V(D’), then Q(D) = Q(D’) V ↠ Q iffthere exists a function f such that Q(D) = f(V(D)) for all D ifffor every D, we have that D ⊢ V ↠ Q

  19. Complexity Of Determinacy Open Question: is the bound on the combined complexity tight?

  20. Complexity Of Pricing • Corollary. • Deciding whether priceDS(Q) ≤ k is: • Combined complexity [input S, D]: Σp2 • Data complexity [input D]: coNP-hard Proposition. Pricing is at least as hard as determinacy How do we deal with the hardness of computation?

  21. Outline • The Pricing Framework • The Pricing Formula • The Complexity of Pricing • Dichotomy and Algorithms for Selections

  22. Restricting Price Points to Selections • A seller can specify only the prices of selectionqueries of the form σR.X=a: prices on columns • The domain of each column is finite and known to buyers and sellers • Price points on selections is how prices are set in most cases today

  23. Dichotomy Theorem Theorem. Assuming selection views only, for any Conjunctive Query w/o self-joins Q, one of the following holds (data complexity): priceQS(D) is in PTIME checking whether priceQS(D)≤k is NP-complete • PTIME: • Q(x,y,z,u,v) :- R(x,u),S(x,y,z),T(y,v) [Chains] • Q(x1,…,xk) :- R1(x1,x2),…,Rk(xk,x1) [Cycles] • NP-complete: • Q(x) :- R(x,y) [Projections] • Q(x,y,z) :- R(x,y,z),S(x),T(y),U(z)

  24. Algorithm For PTIME Cases • The algorithm uses a reduction to maximum flow • Edges of finite capacity represent price points • A set of edges of finite cost is a cutiff they determine the query • Example: • Chain query Q(x,y):-R(x),S(x,y),T(y) S R T Dom(X) = {a1,a2,a3,a4} Dom(Y)= {b1,b2,b3}

  25. S Flow Graph R T R T a4 b1 a3 b2 a2 b3 a1 a4 b1 a3 b2 a2 b3 A set of edges of finite cost is a cutiff they determine the query a1 S

  26. Conclusions • Summary: • The seller sets prices to some views, while the system computes the price of any query • Interesting application of query determinacy • Complexity: dichotomy for CQs w/o self-joins • Future Work: • Pricing in the presence of updates • How do we overcome pricing for intractable queries? • Connection of pricing and privacy

  27. Thank you !