Loading in 2 Seconds...

Download Presentation

Supporting Top- k join Queries in Relational Databases

Loading in 2 Seconds...

90 Views

Download Presentation
##### Supporting Top- k join Queries in Relational Databases

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Supporting Top-k join Queries in Relational Databases**Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington**Introduction**• Often searches are done on multiple features • Each feature produces a different ranking for the query • Must thus join and aggregate rankings on different features**Example**• Find location for a house such that the combination of the cost of the house and 5 years tuition at a nearby school is minimal. • Exact location is not predefined in query, per location the house and school features would have to be analyzed.**Motivation**• Current techniques decouple join and sorting (ranking) of results. • Sorting is expensive and is a blocking operation. • More apparent if ranking and the joining features are different.**Rank-Join Algorithm**• Generate new valid join combinations • Compute score for each combination • For each incoming input, calculate the total score of: • The last seen feature value and the top ranked feature value for all other features in the query. • Store the maximum of these as T (threshold) • Store top k in priority queue. • Halt when lowest value of queue ≥ T**Optimality**• Is Instance Optimal over all correct top-K join algorithms. • Guarantees that cost of Rank-Join is O (cost of any other algorithm). • Mathematically: • Cost(Rank-Join) ≤ c*Cost(Any Other Algorithm) + c’ • c is the optimality ratio • c, c’ > 0**Rank-Join Continued …**• Join strategy crucial • Recommended: Ripple Join • Alternates between tuples • Flexible in the way it sweeps out (rectangular, etc) • Retains ordering in considering samples • Variant of Rank-Join • Hash Rank Join (HRJN) • Block Ripple Join**Hash Rank Join (HRJN) Operator**• Built on idea of hash ripple join • Inputs are as two hash tables • Maintains highest (first) and lowest (last selected) objects from each relation. • Results are added to a priority queue • Advantages: • Smaller space requirement • Can be pipelined**Hash Rank Join (HRJN) Operator: Problems**• Local Ranking Problem • Results from three or more input streams • Larger queue sizes • More database accesses • Buffer Problem • Cannot predict how many partial joins will result**HRJN Solutions?**• Block Ripple Joins • Do comparisons as blocks • Score-Guided Strategy • If thresholds are very different, then this may be because of the way one of the rankings is larger and descends at a slower rate • Can then take more inputs from the slower growing ranking so that the threshold goes closer to the other thresholds**Optimal Join-Order**• Try to have the least number of input records in order to get a correct ranking • No clear way of estimating the order of joins • Have a heuristic – Footrule Distance • Simple measure of similarity among two rankings. • First join the most similar rankings • This would quickly yield a join by accessing fewer records**Rank-Join Algorithm: Benefits**• What can it do? • Integrates well with query plans • Produces results as fast as possible • Provides performance guarantees • Minimizes space requirements • Offers a mechanism to determine the best order of joining to execute query optimally. • Can be improved further if random access is available • Can eliminate on-the-fly duplicate elimination**References**• “Supporting top-k join queries in relational databases” - Ihab Ilyas, Walid Aref, Ahmed Elmagarmid (2004) • Jing Chen : DBIR Spring 2005, CSE-UT Arlington http://ranger.uta.edu/~gdas/Courses/ Spring2005/DBIR/slides/top-k_join.ppt