1 / 16

Query Processing II

Query Processing II. Join Operation. Several different algorithms to implement joins Block nested-loop join Indexed nested-loop join Merge-join Hash-join. Block Nested-Loop Join.

tymon
Download Presentation

Query Processing II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Processing II Yan Huang - CSCI5330 Database Implementation –Access Methods

  2. Join Operation • Several different algorithms to implement joins • Block nested-loop join • Indexed nested-loop join • Merge-join • Hash-join Yan Huang - CSCI5330 Database Implementation –Access Methods

  3. Block Nested-Loop Join for each block Brofr do beginfor each block Bsof s do begin for each tuple trin Br do begin for each tuple tsin Bsdo beginCheck if (tr,ts) satisfy the join condition if they do, add tr• tsto the result.end end end end • r :outerrelation • s:inner relation • cost = br bs + br, • Assuming one buffer for each Yan Huang - CSCI5330 Database Implementation –Access Methods

  4. Block Nested-Loop Join s r … Yan Huang - CSCI5330 Database Implementation –Access Methods

  5. Block Nested-Loop Join • Requires no indices • Can be used with any join condition • cost = br bs + br • Which relation shall be out relations? Yan Huang - CSCI5330 Database Implementation –Access Methods

  6. Allocate More Buffers to Outer Loop • Cost? s r … Yan Huang - CSCI5330 Database Implementation –Access Methods

  7. Allocate More Buffers to Inner Loop • Cost? r s How to allocate M buffers to inner loop/out loop/result? … … Yan Huang - CSCI5330 Database Implementation –Access Methods

  8. Indexed Nested-Loop Join • Index available on the inner relation’s join attribute • Cost = br + nr c • Where c is the cost of traversing index and fetching all matching s tuples for one tuple or r • Index available on the outer relation’s join attribute • We do not use it, can we? r … s … … Yan Huang - CSCI5330 Database Implementation –Access Methods

  9. Indices available on join attributes of both r and s Which one should be the out loop? r … s … … Yan Huang - CSCI5330 Database Implementation –Access Methods

  10. Exercise • Compute depositor customer, with depositor as the outer relation. • Customer has a secondary B+-tree index on customer-name • Blocking factor 20 keys • #customer = 400b/10,000t #depositor =100b/5,000t • Block nested loops • 400*100 + 100 = 40,100 • Indexed nested loops • 100 + 5000 * 5 = 25,100 disk accesses. • CPU cost likely to be less than that for block nested loops join Yan Huang - CSCI5330 Database Implementation –Access Methods

  11. Merge-Join • Sort both relations on their join attribute (if not already sorted on the join attributes). • Merge the sorted relations to join them • Join step is similar to the merge stage of the sort-merge algorithm. • Every pair with same value on join attribute must be matched Cost = br + bs + the cost of sorting if relations are unsorted Yan Huang - CSCI5330 Database Implementation –Access Methods

  12. Merge-Join • But there is a problem! s r A, 1 A, 2 A, 3 B, 1 B, 2 C,1 C, 2 D, 1 A, 5 A, 3 A, 1 A, 7 A, 2 C,1 C, 2 C, 3 ? We assume records with same value are in the same block! Yan Huang - CSCI5330 Database Implementation –Access Methods

  13. Merge-Join • Can be used only for equi-joins and natural joins • Why not on inequality, great than, less than joins? Yan Huang - CSCI5330 Database Implementation –Access Methods

  14. Exercise • Compute depositor customer, with depositor as the outer relation. • Customer has a secondary B+-tree index on customer-name • Blocking factor 20 keys • #customer = 400b/10,000t #depositor =100b/5,000t • Merge join • log(400) + log(100) + 400 + 100 = 516 Yan Huang - CSCI5330 Database Implementation –Access Methods

  15. Hybrid Merge-join • If one relation is sorted, and the other has a secondary B+-tree index on the join attribute • Merge the sorted relation with the leaf entries of the B+-tree . • Sort the result on the addresses of the unsorted relation’s tuples • Scan the unsorted relation in physical address order and merge with previous result, to replace addresses by the actual tuples • Sequential scan more efficient than random lookup Yan Huang - CSCI5330 Database Implementation –Access Methods

  16. Hybrid Merge-join r … s … … Yan Huang - CSCI5330 Database Implementation –Access Methods

More Related