1 / 47

PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries

PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries. Vagelis Hristidis University of California, San Diego Nick Koudas AT&T Research Yannis Papakonstantinou University of California, San Diego. Example. Example. ORDER BY

isaiah
Download Presentation

PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries • Vagelis Hristidis University of California, San Diego • Nick Koudas AT&T Research • Yannis Papakonstantinou University of California, San Diego

  2. Example

  3. Example ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price

  4. Example ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price

  5. Example Problem: Retrieve WHOLE relation ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price

  6. Example PREFER retrieves only part of relation Problem: Retrieve WHOLE relation ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price

  7. Applications Such preference queries are used in Web sites like: • www.Zagat.com ( restaurants) • www.personallogic.com (online retailer)

  8. Definitions - Problem statement • A preference query orders the tuples of a relation according to a function of the attribute values. eg: 0.01· Mileage + 0.6·Year + 0.03· Price • Goal is to produce top-K answers of a preference query, retrieving the minimum # of tuples

  9. Our Approach PREFER materializes a number of ranked views of the relation and uses them to efficiently answer to preference queries.

  10. Our Approach Ranked view 0.075*Price + 0.8*Year Year Ranked view 0.08*Price + 0.2*Year 0.2 0.08 Price

  11. Our Approach Ranked view 0.075*Price + 0.8*Year Year Ranked view 0.08*Price + 0.2*Year 0.35 0.2 Preference query: 0.07*Price + 0.35*Year 0.08 0.07 Price

  12. PREFER Architecture Preprocessing stage • Relation • Space constraints • Discretization of ranked views’ vectors. Views Creation Which ranked views should we materialize?

  13. PREFER Architecture Which ranked view should we use to answer to a specific preference query? Runtime Process Preprocessing stage Query • Relation • Space constraints • Discretization of ranked views’ vectors. View Selection index of mat. views • Query • Ranked View id Pipelining Algorithm Views Creation Mat.Views Which ranked views should we materialize? How to use a preference view to answer to a preference query Output results

  14. PREFER Architecture Which ranked view should we use to answer to a specific preference query? Runtime Process Preprocessing stage Query • Relation • Space constraints • Discretization of ranked views’ vectors. View Selection index of mat. views • Query • Ranked View id Pipelining Algorithm Views Creation Mat.Views Which ranked views should we materialize? How to use a preference view to answer to a preference query Output results

  15. Watermark Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price t1 last tuple Watermark = 14.26

  16. Watermark Calculating the Watermark

  17. How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price • Calculate Watermark for t1, which is 14.26 t1

  18. How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price • Calculate Watermark for t1, which is 14.26 • Find prefix of view with fv greater than watermark value and sort them by fq t1

  19. How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price • Calculate Watermark for t1, which is 14.26 • Find prefix of view with fv greater than watermark value and sort them by fq t1

  20. How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price • Calculate Watermark for t1, which is 14.26 • Find prefix of view with fv greater than watermark value and sort them by fq • Output tuples up to t1 t1

  21. How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price • Calculate Watermark for t1, which is 14.26 • Find prefix of view with fv greater than watermark value and sort them by fq • Output tuples up to t1 • Repeat using first unprocessed as t1 t1

  22. How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price • Calculate Watermark for t1, which is 13.1 • Find prefix of view with fv greater than watermark value and sort them by fq • Output tuples up to t1 • Repeat using first unprocessed as t1 t1

  23. How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price • Calculate Watermark for t1, which is 13.1 • Find prefix of view with fv greater than watermark value and sort them by fq • Output tuples up to t1 • Repeat using first unprocessed as t1 t1

  24. How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm • Calculate Watermark for t1, which is 8.3 • Find prefix of view with fv greater than watermark value and sort them by fq • Output tuples up to t1 • Repeat using first unprocessed as t1 t1

  25. How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result , ordered by 0.01*Mileage+0.6*Year+0.03*Price Ranked View , ordered by 0.02*Mileage+0.4*Year+0.04*Price • Calculate Watermark for t1, which is 8.3 • Find prefix of view with fv greater than watermark value and sort them by fq • Output tuples up to t1 • Repeat using first unprocessed as t1 t1

  26. PREFER Architecture Which ranked view should we use to answer to a specific preference query? Runtime Process Preprocessing stage Query • Relation • Space constraints • Discretization of ranked views’ vectors. View Selection index of mat. views • Query • Ranked View id Pipelining Algorithm Views Creation Mat.Views Which ranked views should we materialize? How to use a preference view to answer to a preference query Output results

  27. Define coverage V1 covers q1: At most k tuples are retrieved from V1 in order to output first result of q1. Year Ranked view 0.8*Price + 0.2*Year q1 0.35 0.2 V1 Preference query: 0.7*Price + 0.35*Year 0.8 0.7 Price

  28. Which ranked view should we use to answer to a specific preference query? Ranked view 0.75*Price + 0.8*Year Year Ranked view 0.8*Price + 0.2*Year 0.2 0.8 Price

  29. Which ranked view should we use to answer to a specific preference query? Ranked view 0.75*Price + 0.8*Year Year Ranked view 0.8*Price + 0.2*Year 0.2 0.8 Price

  30. Which ranked view should we use to answer to a specific preference query? Ranked view 0.75*Price + 0.8*Year Year Ranked view 0.8*Price + 0.2*Year q1 0.35 0.2 V1 Preference query: 0.7*Price + 0.35*Year 0.8 0.7 Price V1 covers q1

  31. PREFER Architecture Which ranked view should we use to answer to a specific preference query? Runtime Process Preprocessing stage Query • Relation • Space constraints • Discretization of ranked views’ vectors. View Selection index of mat. views • Query • Ranked View id Pipelining Algorithm Views Creation Mat.Views Which ranked views should we materialize? How to use a preference view to answer to a preference query Output results

  32. Which ranked views should we materialize? ViewSelection Algorithm while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L VIEWS for i = 1 to C do select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv                    

  33. Which ranked views should we materialize? (cont’d) ViewSelection Algorithm while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L VIEWS for i = 1 to C do select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv                                        

  34. Which ranked views should we materialize? (cont’d) ViewSelection Algorithm while (not all preference vectors in [0,1]n covered) Randomly pick v[0,1]n and add it to the list of views L VIEWS for i = 1 to C do select v L that covers the maximum number of uncovered vectors in [0,1]n VIEWSVIEWSv                                        C = 3

  35. Constraint on # of views Maximum coverage problem using the minimum # of materialized views is NP-Hard. Greedy Heuristic is approximation for maximum coverage.

  36. Related Work • Preference Query Framework [AW00] • Top-k queries • Joins • Fagin [F99,F96,F01], equijoins of ordered data • Selections [reduce top-k selection to range query] • Histograms to estimate cutoff [Chaudhuri&Gravano 99] • Probabilistic model [Donjerkovic&Ramakrishnan 99] • Partitioning [Carey & Kossman 97,98]

  37. Related Work The Onion Technique (Sigmod 2000). Main observation: the points of interest lie on the convex hull of the tuple space. Drawbacks of Onion: • Does not scale • Computing the convex hull is very computationally intensive • Not efficient if the domain of an attribute has a small cardinality • Not efficient for more than the top-1 result

  38. Experiments Measured parameters • # attributes • size of relation • # views • constraint on max # tuples retrieved

  39. Parameters of Experiments • synthetic datasets • 3 to 5 attributes • 10,000 to 500,000 tuples • random & correlated data • discretization of 0.1 or 0.05

  40. Experiments (cont’d) Dual PII CPU, 512MB RAM, 4 attr, 50,000 tuples, 34 Views

  41. Experiments (cont’d) 4 attr, constraint = 500 tuples, discretization = 0.1

  42. Experiments (cont’d) 500,000 tuples, constraint = 500 tuples, discretization = 0.05...0.1

  43. Experiments (cont’d) 4 attr, discretization = 0.1

  44. Experiments (cont’d) 4 attr, discretization = 0.1

  45. Experiments (cont’d) 50,000 tuples, 3 attr, discretization = 0.05

  46. More Resources www.db.ucsd.edu/PREFER • PREFER demo • PREFER Application • Construct Materialized Views • Issue preference queries MS Windows, on top of Oracle DBMS

  47. Conclusions • Methodology to efficiently answer to top-K linearly weighted queries • Algorithm that uses a ranked view to answer to a preference query • Ranked materialized views were used • Experimental evaluation

More Related