1 / 35

Supporting Top- k join Queries in Relational Databases

Supporting Top- k join Queries in Relational Databases. Ihab F. Ilyas , Walid G. Aref , Ahmed K. Elmagarmid. Presented by: Richa Varshney. Introduction. O rdered set of join results according to some provided function. Often searches are done on multiple features.

Download Presentation

Supporting Top- k join Queries in Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: RichaVarshney

  2. Introduction • Ordered set of join results according to some provided function. • Often searches are done on multiple features. • Each feature produces a different ranking for the query. • Joining the individual feature rankings to produce a global ranking.

  3. Example 1: Ranking in Multimedia Retrieval Query Color Histogram Edge Histogram Texture Video Database Color Histogram Edge Histogram Texture

  4. Example 2 SELECT h.id , s.name FROM houses h , schools s WHERE h.location = s.location ORDER BY h.price+10 x s.tuition STOP AFTER 4 4

  5. Example 2 (Cont’d) Houses Schools

  6. Motivation SELECT A.1,B.2 FROM A,B,C WHERE A.1 = B.1 and B.2 = C.2 ORDER BY (0.3*A.1+0.7*B.2) STOP AFTER 5; • Problems:- • Sorting is an expensive operation. • Sorting is a blocking operator. 6

  7. Contribution • Propose a new Rank-Join algorithm • Analyze the I/O cost of the algorithm • Implement the algorithm • Propose a score-guided and adaptive join strategy • Evaluate performance 7

  8. Ripple Join Cartesian product L x R L R (L1(1,1,5) R1(1,3,5)) (L2,R2) {(2,2,4),(2,1,4)} (L2,R1) {2,2,4), (1,3,5)} (L1,R2) {(1,1,5), (2,1,4)} L R 8

  9. Variation Of Ripple Join Block Rectangle Hash Ripple Join: where all the sampled tuples are kept in hash tables in memory 9

  10. Query Model: Top-k Join • m Relations R1, ….., Rm | Ri has: • n attributes • score attribute, si (can be an expression over other attributes) • A global score for a join result is computed as F(s1,…., sm) • A top-k join query is an ordered set of join results according to some provided function that combines the orders on each input. • An example template: SELECT some_attributes FROM R1,…..,Rm WHERE join_condition ORDER BY F(s1,…..,sm) STOP AFTER k 10

  11. Rank-Join Algorithm • Generate new valid join combinations • Compute score for each combination • For each incoming input, calculate the threshold score: • The last seen feature value and the top ranked feature value for all other features in the query. • Store the maximum of these as T (threshold) • Store top k(maximum combined score) results in priority queue. • Halt when lowest value of queue ≥ T 11

  12. Example Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3 Compute a Threshold (T) by Max {(Last L).B + (First R.B), (First L).B + (Last R).B} (1). Get a valid combination using any certain algorithm Ripple Select (L1, R1) => No Result 12

  13. Example--Cont. Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3 (1) Get a valid combination using any certain algorithm Select (L2, R2) (L2, R2), (L2, R1), (L1, R2) => (L1, R2) (2) Compute the score (J) for the result J1(L1, R2) => L.B + R.B = 5 + 4 = 9 13

  14. Example--Cont. Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3 • (3) Compute a Threshold (T) score by Max {(Last L).B + (First R.B), (First L).B + (Last R).B} Selection (L1, R1) , (L2, R2) => T = Max (L2.B + R1.B, L1.B + R2.B) =Max (4+5, 5+4) = 9 • (4) J1= 9 ,T = 9,J1 >= T,Report J1 Since we need top 3 (k=3), continue until k=3 and Min(J1, J2, …Jk) > T 14

  15. Example--Cont. Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3 (1) Select (L3, R3) (L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3) => (L3, R3), (L2, R3) (2) J2(L2, R3) = 4 + 3 = 7 J3(L3, R3) = 3 + 3= 6 15

  16. Example--Cont. Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3 • (3) Calculate T= Max { (Last L).B + (First R).B,(First L).B+ (Last R).B} = Max {L3.B + R1.B , L1.B + R3.B}= Max(3 + 5, 5 + 3) = 8 • (4) J1(L1,R2) = 9(reported),J2( L2, R3) = 7 ,J3(L3, R3) = 6 (Note, J’s are in descending order) Min (J) = 6 < T Continue 16

  17. Example--Cont. Select * From L, R Where L.A = R.A Order By L.B + R.B Stop After 3 (1)Select (L4, R4) => (L4, R1), (L2, R4), (L3, R4) (2) J(L4, R1) = 7, J(L2, R4) = 6, J(L3, R4) = 5 (3) T= Max(L4.B+R1.B, L1.B + R4.B) = Max(7, 7) = 7 (4) J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7,J3(L3,R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5 Min(J1, J2) = 7 >= T (k = 3) 17

  18. Hash Rank Join (HRJN) Operator • Built on idea of hash ripple join • Initialized by specifying four parameters: • Two inputs(Can be HRJN operator) • Join condition(general equality condition/computes valid join) • Combining function(monotone/computes global scores) • Maintains highest (first) and lowest (last selected) objects from each relation. • Results are added to a priority queue 18

  19. Hash Rank Join (HRJN) Operator: Problems • Buffer Problem • Cannot predict how many partial joins will result • Local Ranking Problem 19

  20. HRJN Solutions • Use Block Ripple Join to solve Local Ranking Problem. (e.g. block size = 2) 20

  21. HRJN Solutions—Cont. • HRJN* score-guided join strategy • How to select next (block) tuple T1 = f(Ltop,Rbottom) and T2 = f(Lbottom,Rtop), where f is the ranking function Case 1: T1 >T2 , more inputs should be retrieved from R Case 2:T1 <T2 , more inputs should be retrieved from L 21

  22. An adaptive join strategy • Use input availability as a guide instead of the aforementioned score-guided strategy • If both inputs are available, choose the next input to process. • Otherwise, the available input is processed. • e.g., a mediator over Web-accessible sources and distributed multimedia repositories 22

  23. Join Order • When more than two tables join, the join order matters. (A and C have high similarity) 23

  24. Join Order Algorithm • Rank-Join order heuristic - Get a ranked sample, top S ranked list from L and R - Calculate the similarity using Footrule where (i, j ) is a valid join result that joins object i from L with object j from R

  25. Rank Join Order Heuristic 25

  26. Performance Evaluation Changing the number of required answers: Selectivity = 0.2 % and m= 4 26

  27. Performance Evaluation--Cont. Changing the number of required answers: Selectivity = 0.2 % and m= 4 27

  28. Performance Evaluation--Cont. Changing the number of required answers: Selectivity = 0.2 % and m= 4 28

  29. Performance Evaluation--Cont. Changing the join selectivity: m =4 and K =50 29

  30. Performance Evaluation--Cont. Changing the join selectivity: m =4 and K =50 30

  31. Performance Evaluation--Cont. Changing the join selectivity: m =4 and K =50 31

  32. Performance Evaluation--Cont. Effect of pipelining: selectivity = 0 . 2% and K =50 32

  33. Performance Evaluation--Cont. Effect of pipelining: selectivity = 0 . 2% and K =50 33

  34. Performance Evaluation--Cont. Effect of pipelining: selectivity = 0 . 2% and K =50 34

  35. Thank You 35

More Related