1 / 30

Answering Top-k Queries Using Views

Answering Top-k Queries Using Views. Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris Tsirogiannis (Univ. of Toronto). Introduction. Preferences expressed as scoring functions on the attributes of a relation, e.g. R.

tolla
Download Presentation

Answering Top-k Queries Using Views

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris Tsirogiannis (Univ. of Toronto)

  2. Introduction Preferences expressed as scoring functions on the attributes of a relation, e.g R Top-k: k tuples with the highest score VLDB '06

  3. Related Work • TA [Fagin et. al. ‘96] • Deterministic stopping condition • Always the correct top-k set • PREFER [Hristidis et. al. ‘01] • Stores multiple copies of base relation R • Utilizes only one • We complement existing approaches VLDB '06

  4. Motivation • Query answering using views • Space-Performance tradeoff • Improved efficiency • Can we exploit the same tradeoffs for top-k query answering? VLDB '06

  5. Problem Statement Ranking Views: Materialized results of previously asked top-k queries Problem: Can we answer new ad-hoc top-k queries efficiently using ranking views? VLDB '06

  6. Outline • LPTA Algorithm • View Selection Problem • Cost Estimation Framework • View Selection Algorithms • Experimental Evaluation • Conclusions VLDB '06

  7. LPTA - Setting • Linear additive scoring functions e.g. • Set of Views: • Materialized result of a previously executed top-k query • Arbitrary subset of attributes • Sorted access on pairs • Random access on the base table R VLDB '06

  8. stopping condition LPTA - Example V1 X1 Q Top-1 R(X1, X2) V1 V2 V2 X2 VLDB '06

  9. d iteration LPTA Linear Programming adaptation of TA Q: V1 V2 VLDB '06

  10. LPTA - Example (cont’) stopping condition X1 V1 Q Top-1 R(X1, X2) V1 V2 V2 X2 VLDB '06

  11. Outline • LPTA Algorithm • View Selection Problem • Cost Estimation Framework • View Selection Algorithms • Experimental Evaluation • Conclusions VLDB '06

  12. View Selection Problem • Given a collection of views and a query Q, determine the most efficient subset to execute Q on. • Conceptual discussion • Two dimensions • Higher dimensions VLDB '06

  13. A1 A B B1 View Selection - 2d Q Min top-k tuple V1 M V2 VLDB '06

  14. View Selection - Higher d Theorem: If is a set of views for an -dimensional dataset and Q a query, the optimal execution of LPTA requires a subset of views such that . Question: How do we select the optimal subset of views? VLDB '06

  15. Outline • LPTA Algorithm • View Selection Problem • Cost Estimation Framework • View Selection Algorithms • Experimental Evaluation • Conclusions VLDB '06

  16. Cost Estimation Framework • What is the cost of running LPTA when a specific set of views is used to answer a query? • Cost = number of sequential accesses V1 Min top-k tuple A Q Cost = 6 sequential accesses V2 Can we find that cost without actually running LPTA? B VLDB '06

  17. topkmin Cost Simulation of LPTA on Histograms HQ: approximates the score distribution of the query Q • Use HQ to estimate the score of the k highest tuple (topkmin). • Simulate LPTA in a bucket by bucket lock step to estimate the cost. HQ HV1 HV2 b buckets n/b tuples per bucket VLDB '06

  18. Outline • LPTA Algorithm • View Selection Problem • Cost Estimation Framework • View Selection Algorithms • Experimental Evaluation • Conclusions VLDB '06

  19. View Selection Algorithms • Exhaustive (E): Check all possible subsets of size , . • Greedy (SV): Keep expanding the set of views to use until the estimated cost stops reducing. VLDB '06

  20. T Selected Views Select Views Spherical (SVS) Requires the solution of a single linear program. Q VLDB '06

  21. Select Views By Angle (SVA) Select Views By Angle (SVA): Sort the views by increasing angle with respect to Q. V4 V3 V2 Selected Views Q V1 VLDB '06

  22. General Queries and Views • Views that materialize their top-k tuples. • Truncate the view histograms. • Accommodating range conditions • Select the views that cover the range conditions. • Truncate each attribute’s histogram. VLDB '06

  23. Outline • LPTA Algorithm • View Selection Problem • Cost Estimation Framework • View Selection Algorithms • Experimental Evaluation • Conclusions VLDB '06

  24. Experiments • Datasets (Uniform, Zipf, Real) • Experiments: • Performance comparison of LPTA, PREFER and TA • Accuracy of the cost estimation framework • Performance of LPTA using each of the view selection algorithms • Scalability of the LPTA algorithm VLDB '06

  25. Performance comparison of LPTA, PREFER and TA Real dataset, 2d Uniform dataset, 3d VLDB '06

  26. Cost Estimation Accuracy 2d (buckets = 0.5% of n) (buckets = 1% of n) VLDB '06

  27. Performance of LPTA using View Selection Algorithms 500K tuples, top-100 (3d) (2d) VLDB '06

  28. Scalability Experiments on LPTA (2d, uniform dataset) (500K tuples, top-100) VLDB '06

  29. Conclusions • Using views for top-k query answering • LPTA: linear programming adaptation of TA • View selection problem, cost estimation framework, view selection algorithms • Experimental evaluation VLDB '06

  30. (Thank You!) Questions? VLDB '06

More Related