1 / 40

Answering Top-k Queries Using Views By Gautam Das,Dimitrios Gunopulos , Nick Koudas , Dimitris Tsirogiannis

Answering Top-k Queries Using Views By Gautam Das,Dimitrios Gunopulos , Nick Koudas , Dimitris Tsirogiannis. Presented by: Yesha Gupta. Reference aitrc.kaist.ac.kr/~vldb06/slides/R13-1.ppt. Top-k query. To return k highest ranked values from a relation

yoninah
Download Presentation

Answering Top-k Queries Using Views By Gautam Das,Dimitrios Gunopulos , Nick Koudas , Dimitris Tsirogiannis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answering Top-k Queries Using ViewsBy GautamDas,DimitriosGunopulos,Nick Koudas, DimitrisTsirogiannis Presented by:Yesha Gupta Referenceaitrc.kaist.ac.kr/~vldb06/slides/R13-1.ppt

  2. Top-k query • To return k highest ranked values from a relation • Query: Return top-2 values for function 3x1+10x2+5x3 m R n

  3. Views (1/2) Base table R with three attributes Top-5 query using function f1 = 2x1 + 5x2 • Base ViewVX1 ordered by Score m R n

  4. Views (2/2) • Advantages of Views • faster query response time • space-performance tradeoff to improve efficiency • Can we use view to answer top-k query? Yes • Challenges in using views • guarantee of getting top-k query • which views to select to answer a given query

  5. Ranking Queries • Top-k ranking queries : Select top[k] from relation(R) where order by • Expressed as triple • Assigns a numeric score to any tuple t • Selection condition for the tuples

  6. Ranking Views • Materialized Ranking views : previously executed ranking query • A set of k pairs, ordered by decreasing values of • Relation R with m attributes, base views ,

  7. Related work in Top-k query (1/2) • TA [Fagin et. al. ‘96] • deterministic stopping condition • always the correct top-k set m • List2 (X2) • List1 (X1) R n Number of sorted access = Number of random access

  8. Related work in Top-k query (2/2) • PREFER [Hristidis et. al. ‘01] • A prototype system and available at http://db.ucsd.edu/PREFER/ • Stores multiple copies of base relation R as materialized views • Utilizes only one • This paper complements both the approaches

  9. Problems for views • Problem 1 (Top-k query answer using views) • Input: a set U of views {v1,v2,…}, Query Q • Output: top-k set • Algorithm: LPTA • Problem 2 (View Selection) • Input: a set V of views (Materialized Ranking Views and Base Views), Query Q • Output: the most efficient subset to execute Q on • Algorithm: SelectViews

  10. Outline • LPTA • View Selection problem • Conceptual discussion • Cost estimation problem • Various Select Views methods • General queries and views • Experiment • Conclusion

  11. LPTA Setting • Linear additive scoring functions e.g. • Output Top-2 set; Q = (fQ, 2, *) • Set of Views: • Materialized views V1 ,V2 • Sorted access on pairs • Random access on the base table R

  12. LPTA execution R V1 V2 Top-k buffer for Q = (fQ, 2, *) topkmin = 996 Unseenmax = ? 7 1248 6 996

  13. Calculate Unseenmax • The unseen tuples in the view have satisfy the following inequalities: • The domain of each attribute of R [1,100] • 1<=X1,X2,X3<=100 (1) • 2x1 + 5x2 <= 527 (2) • x2 + 2x3 <= 219 (3) • Unseenmax= Solution to the linear program where we maximize the function subject to these inequalities. • Unseenmax= 1338

  14. stopping condition LPTA – General Example(1/3) V1 X1 Q Top-1 R(X1, X2) V1 V2 V2 X2

  15. d iteration LPTA – General Example(2/3) Linear Programming adaptation of TA Q: V1 V2

  16. LPTA – General Example(3/3) stopping condition X1 V1 Q Top-1 R(X1, X2) V1 V2 V2 X2

  17. LPTA execution R V1 V2 Top-k buffer for Q = (fQ, 2, *) topkmin = 996 Unseenmax = 953.5 Unseenmax = 1338 Yes Is Unseenmax ≤ topkmin ? No

  18. LPTA • LPTA becomes TA when the set of views U = set of base views • Execution cost: Both have Sequential as well as Random Access • Every sequential access incurs random access • # of sequential access can be considered as running cost • Running cost = O(dr) d= #of lock-step iterations and r= # of views

  19. View Selection Problem (1/5) • Given a collection of views and a query Q, determine the most efficient subset to execute Q on. • Conceptual discussion • Two dimensions • Higher dimensions

  20. A B1 B B2 View Selection – 2d (2/5) Q Min top-k tuple V1 M V2

  21. A1 A B B2 View Selection - 2d (3/5) Q Min top-k tuple V1 M V2

  22. View Selection - 2d (4/5) • Theorem: If is a set of views for a two dimensional dataset and Q is a query. Let and be the closet view vectors in anticlockwise and clockwise order respectively. Then the optimal execution of LPTA requires a subset of views from .

  23. View Selection - Higher d (5/5) Theorem: If is a set of views for an -dimensional dataset and Q a query, the optimal execution of LPTA requires a subset of views such that . Question: How do we select the optimal subset of views?

  24. Cost Estimation Framework • What is the cost of running LPTA when a specific set of views is used to answer a query? • Cost = number of sequential accesses V1 Min top-k tuple A Q Cost = 6 sequential accesses V2 Can we find that cost without actually running LPTA? B

  25. Simulation of LPTA on Histograms Equi – depth Histogram • Base ViewVX1 ordered by Score HVX1 Let’s say, each histogram has 2 bucketsSo b = 5There are 10 tuples in the view, so n=10 Each bucket will represent n/b data points or attribute valuesn/b = 10/5 = 2 b buckets

  26. Simulation of LPTA on Histograms • Estimate query score distribution by convolution Domain [0,1]

  27. topkmin Cost Simulation of LPTA on Histograms • Use HQ to estimate the score of the k highest tuple (topkmin). • Simulate LPTA in a bucket by bucket lock step to estimate the cost. HQ: approximates the score distribution of the query Q HQ HV1 HV2 b buckets n/b tuples per bucket

  28. Estimate cost • Number of buckets visited (d) = 3 • Number of views (r̕) = 2 • Number of tuples per bucket (n/b) = 56/7 = 8 • Number of tuples in last scanned bucket (n̕) = 3 • Number of sorted access = ((d-1) n/b + n̕) r̕ = ((3-1) 8 + 3) 2 = 38 • Running time = Ο((d-1) + logn̕)

  29. View Selection Algorithms • Excustive(E): Check all possible subsets where and select subset of views with smallest cost • Greedy (SV): Keep expanding the set of views to use until the estimated cost stops reducing.

  30. T Selected Views Select Views Spherical (SVS) Requires the solution of a single linear program. Q

  31. Select Views By Angle (SVA) Select Views By Angle (SVA): Sort the views by increasing angle with respect to Q. V4 V3 V2 Selected Views Q V1

  32. General Queries and Views • Views that materialize their top-k tuples. • Convolute the view histograms • Truncate the view histograms • Run EstimateCost() • Accommodating range conditions • Select the views that cover the range conditions. • Truncate each attribute’s histogram. • Convolute histogram

  33. Experiments • Datasets (Uniform, Zipf, Real) • Experiments: • Performance comparison of LPTA, PREFER and TA • Accuracy of the cost estimation framework • Performance of LPTA using each of the view selection algorithms • Scalability of the LPTA algorithm

  34. Performance comparison of LPTA, PREFER and TA Real dataset, 2d Uniform dataset, 3d

  35. Cost Estimation Accuracy 2d (buckets = 0.5% of n) (buckets = 1% of n)

  36. Performance of LPTA using View Selection Algorithms 500K tuples, top-100 (3d) (2d)

  37. Scalability Experiments on LPTA (2d, uniform dataset) (500K tuples, top-100)

  38. Conclusions • Using views for top-k query answering • LPTA: linear programming adaptation of TA • View selection problem, cost estimation framework, view selection algorithms • Experimental evaluation

  39. References • aitrc.kaist.ac.kr/~vldb06/slides/R13-1.ppt • Answering Top-k Queries Using Views: Gautam Das, DimitriosGunopulos, Nick Koudas , DimitrisTsirogiannis

  40. Thank you

More Related