Supporting Top- k join Queries in Relational Databases

1 / 13

# Supporting Top- k join Queries in Relational Databases - PowerPoint PPT Presentation

##### Supporting Top- k join Queries in Relational Databases

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington

2. Introduction • Often searches are done on multiple features • Each feature produces a different ranking for the query • Must thus join and aggregate rankings on different features

3. Example • Find location for a house such that the combination of the cost of the house and 5 years tuition at a nearby school is minimal. • Exact location is not predefined in query, per location the house and school features would have to be analyzed.

4. Motivation • Current techniques decouple join and sorting (ranking) of results. • Sorting is expensive and is a blocking operation. • More apparent if ranking and the joining features are different.

5. Rank-Join Algorithm • Generate new valid join combinations • Compute score for each combination • For each incoming input, calculate the total score of: • The last seen feature value and the top ranked feature value for all other features in the query. • Store the maximum of these as T (threshold) • Store top k in priority queue. • Halt when lowest value of queue ≥ T

6. Optimality • Is Instance Optimal over all correct top-K join algorithms. • Guarantees that cost of Rank-Join is O (cost of any other algorithm). • Mathematically: • Cost(Rank-Join) ≤ c*Cost(Any Other Algorithm) + c’ • c is the optimality ratio • c, c’ > 0

7. Rank-Join Continued … • Join strategy crucial • Recommended: Ripple Join • Alternates between tuples • Flexible in the way it sweeps out (rectangular, etc) • Retains ordering in considering samples • Variant of Rank-Join • Hash Rank Join (HRJN) • Block Ripple Join

8. Hash Rank Join (HRJN) Operator • Built on idea of hash ripple join • Inputs are as two hash tables • Maintains highest (first) and lowest (last selected) objects from each relation. • Results are added to a priority queue • Advantages: • Smaller space requirement • Can be pipelined

9. Hash Rank Join (HRJN) Operator: Problems • Local Ranking Problem • Results from three or more input streams • Larger queue sizes • More database accesses • Buffer Problem • Cannot predict how many partial joins will result

10. HRJN Solutions? • Block Ripple Joins • Do comparisons as blocks • Score-Guided Strategy • If thresholds are very different, then this may be because of the way one of the rankings is larger and descends at a slower rate • Can then take more inputs from the slower growing ranking so that the threshold goes closer to the other thresholds

11. Optimal Join-Order • Try to have the least number of input records in order to get a correct ranking • No clear way of estimating the order of joins • Have a heuristic – Footrule Distance • Simple measure of similarity among two rankings. • First join the most similar rankings • This would quickly yield a join by accessing fewer records

12. Rank-Join Algorithm: Benefits • What can it do? • Integrates well with query plans • Produces results as fast as possible • Provides performance guarantees • Minimizes space requirements • Offers a mechanism to determine the best order of joining to execute query optimally. • Can be improved further if random access is available • Can eliminate on-the-fly duplicate elimination

13. References • “Supporting top-k join queries in relational databases” - Ihab Ilyas, Walid Aref, Ahmed Elmagarmid (2004) • Jing Chen : DBIR Spring 2005, CSE-UT Arlington http://ranger.uta.edu/~gdas/Courses/ Spring2005/DBIR/slides/top-k_join.ppt