1 / 35

Supporting top-k join queries in relational databases

Supporting top-k join queries in relational databases. Ihab Ilyas, Walid Aref, Ahmed Elmagarmid Presented by Jing Chen. 1. Motivation. Select A.1, B.2 From A, B, C Where A.1=B.1 and B.2 = C.2 Order By (0.3*A.1 + 0.7*B.2) Stop After 5. Traditional Joins. Nested Loop Merge Join

dexter
Download Presentation

Supporting top-k join queries in relational databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting top-k join queries in relational databases Ihab Ilyas, Walid Aref, Ahmed Elmagarmid Presented by Jing Chen

  2. 1. Motivation • Select A.1, B.2 From A, B, C Where A.1=B.1 and B.2 = C.2 Order By (0.3*A.1 + 0.7*B.2) Stop After 5

  3. Traditional Joins • Nested Loop • Merge Join • Hash Join

  4. 2. Goals/Contribution • Propose a pipe lined Rank-Join algorithm • Analyze the I/O cost of the algorithm • Implement the algorithm • Propose a optimal join strategy • Evaluate performance

  5. 3. Ripple Join JOIN : L.A = R.A L and R are descending ordered by B (L1(1,1,5) R1(1,3,5)) L R

  6. 3. Ripple Join --contd JOIN: L.A = R.A (L2,R2) {(2,2,4),(2,1,4)} L R

  7. Ripple Join --contd JOIN: L.A = R.A (L2,R2) {(2,2,4),(2,1,4)} (L2,R1) {2,2,4), (1,3,5)} (L1,R2) {(1,1,5), (2,1,4)} L R

  8. 3. Variation Of Ripple Join Rectangle Block Hash

  9. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (1). Get a valid combination using any certain algorithm Ripple Select (L1, R1) => No Result

  10. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (1). Get a valid combination using any certain algorithm Select (L2, R2) (L2, R2), (L2, R1), (L1, R2) => (L1, R2)

  11. 4. Proposed Algorithm -- Contd Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (2). Compute the score (J) for the result J1(L1, R2) => L.B + R.B = 5 + 4 = 9

  12. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (3). Compute a water mark score (T) by Max {(Last L).B + (First R.B), (First L).B + (Last R).B}

  13. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 (3). Compute a water mark (T) score by Max {(Last L).B + (First R.B), (First L).B + (Last R).B} Selection (L1, R1) , (L2, R2) => T = Max (L2.B + R1.B, L1.B + R2.B) =Max (4+5, 5+4) = 9

  14. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 J1= 9 T = 9 J1 >= T Report J1 Since we need top 2 (k=2), continue until k=2 and Min(J1, J2, …Jk) > T

  15. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 Select (L3, R3) (L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3) => (L3, R3), (L2, R3) J2(L2, R3) = 4 + 3 = 7 J3(L3, R3) = 3 + 3= 6

  16. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 Calculate T T = Max { (Last L).B + (First R).B, (First L).B+ (Last R).B} = Max {L3.B + R1.B , L1.B + R3.B} = Max(3 + 5, 5 + 3) = 8

  17. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 T = 8 J1(L1,R2) = 9 reported J2( L2, R3) = 7 J3(L3, R3) = 6 Note, J’s are in descending order Min (J) = 6 < T Continue Comment: Calculate T before J is more efficient. Can stop after find first Jk >= T

  18. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 Select (L4, R4) => (L4, R1), (L2, R4), (L3, R4) J(L4, R4) = 7, J(L2, R4) = 6, J(L3, R4) = 5 T= Max(L4.B+R1.B, L1.B + R4.B) = Max(7, 7) = 7

  19. 4. Proposed Algorithm Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 2 T= 7 J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7, J3(L3, R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5 Min(J1, J2) = 7 >= T (k = 2) Comment: When reach all records, T does not need to be calculated, unless, calculate T first, and compare each J(i) with T immediately

  20. 4. The Algorithm

  21. 5. Implementation • Hash rank join operator (HRJN) - Use Hash Ripple Join - Two hash table contain the two inputs - A queue holds ordered join results - Ltop, Rtop, Lbottom, Rbottom are used to calculate T

  22. 5. Implementation contd • Issues with HRJN Buffer problem Local Ranking Problem

  23. 5. Implementation contd • Use Block Ripple Join to Solve Local Ranking Problem. (block size = 2)

  24. 5. Implementation contd • HRJN* score-guided join strategy - How to select next (block) tuple T1 = Ltop + Rbottom , T2 = Lbottom + Rtop T = Max(T1, T2) If T1 > T2, need to reduce T1. How?

  25. 5. Implementation contd • HRJN* score-guided join strategy - How to select next (block) tuple T1 = Ltop + Rbottom , T2 = Lbottom + Rtop T = Max(T1, T2) If T1 > T2, need to reduce T1. How? Reduce Rbottom and not reduce Lbottom (descending ordered), thus more tuples should be retrieved from R to reduce T1

  26. 6. Join Order • When more than two tables join, the join order matters. (A and C have high similarity)

  27. 6. Join Order -- contd • Rank-Join order heuristic - Get a ranked sample, top S ranked list from L and R - Calculate the similarity using footrule Where L(i) and R(i) are the rank of object i in L and R

  28. 6. Join Order -- contd

  29. 7. Generalizing the rank-join • Using indexes • Eliminate duplications • Faster termination

  30. 8. Performance Evaluation Selectivity = 0.2 and m= 4

  31. 8. Performance Evaluation Selectivity = 0.2 and m= 4

  32. 8. Performance Evaluation Selectivity = 0.2 and m= 4

  33. m = 4 and K=50

  34. m = 4 and K=50

  35. m = 4 and K=50

More Related