1 / 29

Sowmya Muniraju

Proposed By, Ihab F. Ilyas Walid G. Aref Ahmed K. Elmagarmid. Supporting top-k queries in Relational databases. Proceedings of the 29th international conference on Very large databases, March 2004. Sowmya Muniraju. Presented By:. Outline. Introduction

eljah
Download Presentation

Sowmya Muniraju

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proposed By, Ihab F. Ilyas Walid G. Aref Ahmed K. Elmagarmid Supporting top-k queries in Relational databases.Proceedings of the 29th international conference on Very large databases, March 2004 Sowmya Muniraju Presented By:

  2. Outline • Introduction • Existing join strategies • Contributions • Related Work • Introduction to New Rank join algorithm • Overview of Ripple Joins • New Rank join algorithm • Physical Rank Join Operators • HRJN • HRJN* • Performance Evaluation • Conclusion

  3. Introduction • Need for support of ranking in Relational Databases. • Attributes in Relational databases spread across multiple relations, hence need for ranking on join queries. • User mostly interested in top few results. • Resultset should be ordered based on certain conditions (scoring functions).

  4. Existing Join strategies • Sort-Merge join • Relations sorted on join columns. • Nested loop join • Tuples of outer relation are joined with tuples of the inner relation. • Hash join • 2 phases: Build, Probe • Build hash table for smaller of the two relations. • Probe this hash table with hash value for each tuple in the other relation.

  5. Top-k using existing join strategies • Given a query, how do we get the top-k results? SELECT A.1, B.2 FROM A, B, C WHERE A.1 = B.1 AND B.2 = C.2 ORDER BY ( 0.3 * A.1 + 0.7 * B.2 ) STOP AFTER 5; • Problems ? • Sorting is a blocking operation. • Sorting is expensive and has been done thrice.

  6. Order limitations on existing joins • Sort-merge join • Sorting is done on joining columns, NOT on columns that participate in scoring function. • Nested-loop join • Orders of only the outer loop is maintained. • Hash join • Orders on both inputs are lost after the join, when hash tables do not fit in memory. Common characteristic in these joins: Decouple join from sort.

  7. Contributions • Proposed a new rank join algorithm. • Implemented this algorithm in practical pipelined rank-join operators based on ripple join. • Proposed a scoring guide function that reduces the number of tuples to be evaluated to get the desired resutls.

  8. Desired Result SELECT A.1, B.2 FROM A, B, C WHERE A.1 = B.1 AND B.2 = C.2 ORDER BY ( 0.3 * A.1 + 0.7 * B.2 ) STOP AFTER 5; DESIRED: Using rank join Using existing join strategies

  9. Related Work • This problem is closely related to top-k selection queries. • Here, scoring function is applied on multiple attributes m of the same relation. • Related algorithms: Threshold Algorithm(TA), No-Random Access Algorithm(NRA), J*, A*

  10. Introduction: New Rank Join Algorithm • Tuples are retrieved in order to preserve ranking. • Produces first ranked join results as quickly as possible. • Uses a monotonicranking function. • Based on the idea of ripple join. • Integration with existing physical query engines. • Variations: HRJN, HRJN*

  11. Overview of Ripple Joins • Previously unseen random tuple from one relation is joined with previously seen tuples from another relation. • Variations of Ripple Joins • Block • Hash

  12. Rank Join Algorithm

  13. Example L.A = R.A Scoring Function: L.B+ R.B K = 2 L_top R_top L_bottom R_bottom L R K = 1 K = 2 K = 0 Threshold (T): 10 8 9 7 Right_threshold =f( R_top, L_bottom ) Left_threshold = f( L_top, R_bottom ) T = Max(Left_threshold, Right_threshold ) 10 10 10 8 8 8 7 7 7 9 9 9 [ (1,1,5) (2,1,4) ] = 9 [ (2,2,4) (3,2,3) ] = 7 [ (3,2,3) (3,2,3) ] = 6 [ (4,3,2) (1,3,5) ] = 7 [ (2,2,4) (4,2,2) ] = 6 LI, RI not a valid join L4,R1 | L2,R4 | L3,R4 are valid joins L3, R3 | L2, R3 are valid joins L1, R2 is a valid join. [ (3,2,3) (4,2,2) ] = 5

  14. Hash Rank Join Operator (HRJN) • Variant of Symmetrical hash join algorithm. • Data Structures • Hash table for each input. • Priority Queue - holds valid join combinations along with their scores. • Methods implemented • Open: initializes its operator and prepares its internal state. • Get Next: returns next ranked join result upon each call. • Close: terminates the operator and performs the necessary clean up.

  15. Open(L, R, C, f) L = Left Input R = Right Input C = Join condition f = Monotonic scoring function

  16. GetNext() Output: Next ranked join result

  17. Local Ranking Problem Solving • Unbalance retrieval rate of left and right inputs. • Use concept of Block Ripple Join.

  18. Example 2 L.A = R.A Scoring Function: L.B+ R.B K = 2 Scoring Function: L.B+ R.B K = 2 L_top R_top L_bottom R_bottom L R Threshold (T): 10 Right_threshold =f( R_top, L_bottom ) Left_threshold = f( L_top, R_bottom ) T = Max(Left_threshold, Right_threshold ) 9 10 10 8 10 10 7 10 10 10 10 10 [ (4,3,2) (1,3,5) ] = 7 No valid joins. L4, R1 is a valid join

  19. HRJN*: Score-Guided Join Strategy Retrieve tuple from input T1 = f( L_top, R_bottom) T2 = f( R_top, L_bottom) Yes If T1 > T2 No Input = L Input = R

  20. Exploiting available indexes • Generalize Rank-join to use random access if available. • Two cases: • An index on join attribute(s) of one input. • An index on join attribute(s) for each input. • Problem: Duplicatescan be produced as indexes will contain all data seen and not yet seen.

  21. Exploiting Indexes: On-the-fly duplicate elimination Scoring Function: L.B+ R.B Index available on R L R f( L_bottom, R_bottom) = 59 Any join result, not yet produced, cannot have a combined score greater than f( L_bottom, R_bottom) [ (1,1,100) (2,1,9) ] = 109 [ (2,2,5) (3,2,8) ] = 58 [ (2,2,50) (4,2,5) ] = 55

  22. Exploiting Indexes: Faster Termination Previously, T = ( 109, 60 ) = 109 After reducing L_top, T = ( 59, 60 ) = 60 L.A = R.A Scoring Function: L.B+ R.B Index available on R L R L_top = L_bottom Reduce L_top to L_bottom, i.e

  23. Performance Evaluation Top-k join operators M = 4 Selectivity = 0.2%

  24. Effect of selectivity M = 4 K = 50

  25. Effect of pipelining Selectivity = 0.2% K = 50

  26. Conclusion • Supported top-k join queries using the new rank join algorithm. • Algorithm uses ranking on the input relations to produce ranked join results on a combined score. • The ranking is performed progressively during the join operation. • HRJN, HRJN* operators implement the new algorithm. • Generalization of this algorithm utilized available indexes for faster termination.

  27. References • “Supporting Top-k Join Queries in Relational Databases.”, Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid, March 2004 • Jing Chen: DIBR Spring 2005, CSE - UT Arlington

  28. THANK YOU

More Related