1 / 11

Sort-Merge Join Implementation Details for Minibase

University of California – Riverside Department of Computer Science & Engineering cs179G – Database Project Phase #4. Sort-Merge Join Implementation Details for Minibase. by Demetris Zeinalipour http://www.cs.ucr.edu/~cs179g-t/. Sort-Merge Join Review. Sailors S. Reserves R.

Download Presentation

Sort-Merge Join Implementation Details for Minibase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. University of California – Riverside Department of Computer Science & Engineering cs179G – Database Project Phase #4 Sort-Merge Join Implementation Details for Minibase by Demetris Zeinalipour http://www.cs.ucr.edu/~cs179g-t/ 1

  2. Sort-Merge Join Review Sailors S Reserves R Query with JOIN SELECT S.name FROM Sailors S, Reserves R WHERE S.sid = R.id • How to implement the JOIN operator? • Using Nested Loop Joins. tuple-at-a-time: For each tuple in S check with every tuple in R variation: page-at-a-time (every page contains several records) • Using Block Nested Loop Join. Idea: Load in memory smaller relation e.g. S and then scan relation S on a page-at-a-time basis. Notice: The idea can be generalized even if the smaller relation doesn’t fit in memory • Using Sort-Merge Join Idea: Sort both relations using an external sort algorithm and then merge the relations. 2

  3. External-Sorting Review 1/3 • When? If the data to be sorted is too big to fit in main memory then we need an external sort algorithm. • Simple approach: 2-way Merge-Sort } Inpage1:[ 11,7, 2,1] Inpage2:[19, 7,5,4 ] Use quicksort internally Memory 4 • Steps • Fetch a page to memory • Sort Page in memory • Write page back to disk • Merge pages levelwise • (see next page) HeapFile 1 inpage1 1-3 outpage 4 inpage2 3

  4. External-Sorting Review 2/3 2-way mergesort => • Passes: log2N+1=4 • Cost: 2N(log2N+1)=64 I/O • Expensive In project we use External Sort (sort.C) which utilizes all available buffer pages (10) and reduces the number of Passes and the I/O cost 4

  5. External-Sorting Review 3/3 Idea similar with 2-way Mergesort with the difference that we utilize B-1 buffer pages (B>3) Don’t worry about this implementation as it is already implemented in sort.C Use quicksort internally Memory Heapf 1-3 0 1 2 In this project you should call: Sort() from within the sortMerge constructor before proceeding to the merge phase outpage 3 4 5 4 6 7 5

  6. The Merging Phase of SMJoin • Now that the two relations R and S are sorted we must merge them. • Merge using 2 iterators to move from page to page and from record to record • Works fine ONLY if both R and S have NO duplicates. (e.g if sid is a foreign key in a 1:1 relation Sailor, Address) Page Tr R.sid Heapfile S.sid Gs 6 Forward the iterator (Tr or Gs) that has the smallest value until Tr=Gs then output value

  7. The Merging Phase of SMJoin • What if relations have duplicates? (check example) (either both of them or just one of them) • Therefore we need to use 3 iterators (1 for Marking) Tr R.sid S.sid Ts Gs • The one extra Ts iterator will be used as soon Tr=Gs at which point we will move Ts to Gs position and use Ts to iterate S • The full version of the algorithm is shown in 2 slides 7

  8. HeapFiles (heapfile.h, scan.C) • Database File: Organization of various pages into a logical. • In a heapfile pages are unordered within the file • In order to Scan the pages (records) of a heapfile we will use scan.C Example: // Sorting a heapfile (after : heapR stored in Catalog) Sort(“unsortheapR”, “heapR”, ..)rest of params from SortMerge() // Creating a scan on a heapfile HeapFile heapR(“heapR”, status); Scan * Rscan = heapR.openScan(Rstatus); // Scan until DONE Rstatus = Rscan->getNext(RID rid,char* RecR, int lenR); // Inserting results in Out Heapfile HeapFile heapOut(“heapOut”, status); memmove(char *RecO, char* RecR, int lenR);//do same for S heapOut.insertRecord(char *recptr, lenR+lenS, RID&outRID); All the work of Pinning/Unpinning pages is done from within Scan since it locates the directoryheader Page from the catalog and proceeds from there on with getNext() 8

  9. The Merging Phase of SMJoin } Sort R & S The full algorithm (this is all you need to implement) } Init Iterators } Fast forward R } Fast forward S Consider Scan::Position() 9

  10. Implementation of the SMJoin Algorithm The Big Picture main.C (or smjoin_main.C same thing) => SMJTester.C (runTests()) => test1() (this runs all 6 tests actually) => createFiles(); // creates 5 Heapfiles using // the data of same constant integer arrays (data0,…data4) => test(i) { // inside sortMerge Constructor } //sortMerge heapfileR No_of_cols joinColumn sortMerge sm(“file0”, 2, attrType, attrsize, 0, “file1”, 2, attrType, attrsize, 0, “test1”, 10, Ascending, s); { // inside sortMerge Constructor heapfileS Out_heapfile No_of_pages available minirel.h => Sort( (infile) “file0”, (outfile) “BRfile1”, (no_of_cols)2, attrType, attrsize, (joincolumn)0, (no_of_pages_available)10, (status)s ) 10

  11. Where to start from? • Start out by reading Sort-Merge Join from book (Chap 12.5.2 2nd edition, 14.4.2 3rd edition) • Study SMJTester.C which contains the tests and understand the test scenarios. • Start Implementing sortMerge.C by invoking the Sort() constructor etc. • Have a closer look at the new classes that you have: Sort.C, Heapfile.C and Scan.C 11

More Related