270 likes | 634 Views
What you will learn from this set of lectures . How to efficiently sort a file so large it won't all fit in memory? What knobs are available for me to tune the performance of such a sort algorithm
E N D
1. External Sorting Chapter 13 (Sec. 13-1-13.5): Ramakrishnan & Gehrke and
Chapter 11 (Sec. 11.4-11.5): G-M et al. (R2)
OR
Chapter 2 (Sec. 2.4-2.5): Garcia-et Molina al. (R1)
The slides for this text are organized into chapters. This lecture covers Chapter 11.
Chapter 1: Introduction to Database Systems
Chapter 2: The Entity-Relationship Model
Chapter 3: The Relational Model
Chapter 4 (Part A): Relational Algebra
Chapter 4 (Part B): Relational Calculus
Chapter 5: SQL: Queries, Programming, Triggers
Chapter 6: Query-by-Example (QBE)
Chapter 7: Storing Data: Disks and Files
Chapter 8: File Organizations and Indexing
Chapter 9: Tree-Structured Indexing
Chapter 10: Hash-Based Indexing
Chapter 11: External Sorting
Chapter 12 (Part A): Evaluation of Relational Operators
Chapter 12 (Part B): Evaluation of Relational Operators: Other Techniques
Chapter 13: Introduction to Query Optimization
Chapter 14: A Typical Relational Optimizer
Chapter 15: Schema Refinement and Normal Forms
Chapter 16 (Part A): Physical Database Design
Chapter 16 (Part B): Database Tuning
Chapter 17: Security
Chapter 18: Transaction Management Overview
Chapter 19: Concurrency Control
Chapter 20: Crash Recovery
Chapter 21: Parallel and Distributed Databases
Chapter 22: Internet Databases
Chapter 23: Decision Support
Chapter 24: Data Mining
Chapter 25: Object-Database Systems
Chapter 26: Spatial Data Management
Chapter 27: Deductive Databases
Chapter 28: Additional Topics
The slides for this text are organized into chapters. This lecture covers Chapter 11.
Chapter 1: Introduction to Database Systems
Chapter 2: The Entity-Relationship Model
Chapter 3: The Relational Model
Chapter 4 (Part A): Relational Algebra
Chapter 4 (Part B): Relational Calculus
Chapter 5: SQL: Queries, Programming, Triggers
Chapter 6: Query-by-Example (QBE)
Chapter 7: Storing Data: Disks and Files
Chapter 8: File Organizations and Indexing
Chapter 9: Tree-Structured Indexing
Chapter 10: Hash-Based Indexing
Chapter 11: External Sorting
Chapter 12 (Part A): Evaluation of Relational Operators
Chapter 12 (Part B): Evaluation of Relational Operators: Other Techniques
Chapter 13: Introduction to Query Optimization
Chapter 14: A Typical Relational Optimizer
Chapter 15: Schema Refinement and Normal Forms
Chapter 16 (Part A): Physical Database Design
Chapter 16 (Part B): Database Tuning
Chapter 17: Security
Chapter 18: Transaction Management Overview
Chapter 19: Concurrency Control
Chapter 20: Crash Recovery
Chapter 21: Parallel and Distributed Databases
Chapter 22: Internet Databases
Chapter 23: Decision Support
Chapter 24: Data Mining
Chapter 25: Object-Database Systems
Chapter 26: Spatial Data Management
Chapter 27: Deductive Databases
Chapter 28: Additional Topics
2. What you will learn from this set of lectures How to efficiently sort a file so large it won’t all fit in memory?
What knobs are available for me to tune the performance of such a sort algorithm & how do I tune them?
3. Motivation A classic problem in computer science!
Data requested in sorted order
e.g., arrange movie records in increase order of ratings.
Why is it important?
is first step in bulk loading (i.e., creating) B+ tree index.
useful also for eliminating duplicate copies in a collection of records & for aggregation (Why?) [can you think of any other strategy for these tasks?]
(later) for efficient joins of relations, i.e., Sort-merge algo.
Problem: sort 1GB of data with 50MB of RAM.
4. I/O Computation Model disk I/O (read/write a block) very expensive compared to processing it once it’s in memory.
random block accesses very common.
reasonable model for computation using secondary storage ? count only disk I/O’s.
5. Desiderata for good DBMS algorithms overlap I/O and cpu processing as much as possible.
use as much data from each block read as possible; depends on record clustering.
cluster records that are accessed together in consecutive blocks/pages.
buffer frequently accessed blocks in memory.
6. Merge Sort Overview Idea: given B buffer pages of main memory
(Sort phase)
read in B pages of data each time and sort internally
suppose entire data stored in m * B pages
sort phase produces m sorted runs (sublists) of B pages each
7. Merge Sort Overview (Merge phase)
repeatedly merge two sorted runs
(Pass 0: the sort phase)
Pass 1 produces m/2 sorted runs of 2B pages each
Pass 2 produces m/4 sorted runs of 4B pages each
continue until one sorted run of m * B pages produced
2-way merge can be optimized to k-way merge, where k can be as large as B-1 (see Ch. 13.2 and (R1 OR R2) chapters).
8. Sorting Example Setup:
10M records of 100 bytes = 1GB file.
Stored on Megatron 747 disk, with 4KB blocks, each holding 40 records + header information.
Suppose each cylinder = 1MB. ? entire file takes up 1000 cylinders ? 250,000 blocks.
50M available main memory = 50 x 10^6/(4000) = 12,500 blocks = 1/20th of file.
Sort by primary key field. => ==>
9. Merge Sort Common mainmemory sorting algorithms don't optimize disk I/O's. Variants of Merge Sort do better.
Merge = take two sorted lists and repeatedly choose the smaller of the ``heads'' of the lists (head = first among the unchosen).
Example: merge 1,3,4,8 with 2,5,7,9 = 1,2,3,4,5,7,8,9.
Merge Sort based on recursive algorithm: divide list of records into two parts; recursively mergesort the parts, and merge the resulting lists.
10. TwoPhase, Multiway Merge Sort “Vanilla” Merge Sort still not very good in disk I/O model.
log2 n passes, so each record is read/written from disk log2 n times.
2PMMS: 2 reads + 2 writes per block.
Phase 1
1. Fill buffer with records.
2. Sort using favorite mainmemory sort.
3. Write sorted sublist to disk.
4. Repeat until all records have been put into one of the sorted sublists.
Sorted sublist = SSL = sorted run.
11. TwoPhase, Multiway Merge Sort
12. TwoPhase, Multiway Merge Sort
13. TwoPhase, Multiway Merge Sort
14. TwoPhase, Multiway Merge Sort
15. 2PMMS (contd.) Phase 2
Use one buffer for each of the sorted sublists and one buffer for output.
i.e., split available buffer into several parts (logical buffers) and allocate for various purposes.
16. 2PMMS (contd.) Phase 2
Use one buffer for each of the sorted sublists and one buffer for output.
17. 2PMMS (contd.) Phase 2
Use one buffer for each of the sorted sublists and one buffer for output.
18. 2PMMS (contd.) Initially load input buffers with the first blocks of their respective sorted lists.
Repeatedly run a competition among the first unchosen records of each of the buffered blocks.
Move the record with the least key to the output block; it is now ``chosen.''
Manage the buffers as needed:
If an input block is exhausted, get the next block from the same list (i.e., file).
If the output block is full, write it to disk.
(assumes each buffer holds just one block. In practice,you design for each buffer to hold as many blocks as possible.)
19. Analysis of Naďve Implementation Recall problem parameters.
Assume blocks are stored at random, so average access time includes seek + rotational delay + transfer, and is about 15 ms.
File stored on 250,000 blocks, read and written once in each phase.
1,000,000 disk I/O's * 0.015 seconds = 15,000 seconds = 250 minutes = 4+ hours.
Problem and Interlude:
How many records can you sort with 2PMMS, under our assumptions about records, main memory, and the disk? What would you do if there were more?
20. Improving the Running Time of 2PMMS Here are some techniques that sometimes make
secondarymemory algorithms more efficient:
1. Group blocks by cylinder (“cylindrification”).
2. One big disk ? several smaller disks.
3. Mirror disks = multiple copies of the same
data.
4. ``Prefetching'' or ``double buffering.''
5. Disk scheduling; the ``elevator'' algorithm.
21. Cylindrification If we are going to read or write blocks in a known order, place them by cylinder, so once we reach that cylinder we can read block after block, with no seek time or rotational latency.
22. Prefetching If we have extra space for mainmemory buffers, consider loading buffers in advance of need. Similarly, keep output blocks buffered in main memory until it is convenient to write them to disk.
Example: Phase 2 of 2PMMS
With 50MB of main memory, we can afford to buffer two cylinders for each sublist and for the output (recall: one cylinder = 1MB).
Consume one cylinder for each sublist while the other is being loaded from disk.
Similarly, write one output cylinder while the other is being constructed (via merge).
23. Prefetching Thus, seek and rotational latency are made negligible for Phase I (thanks to cylindrification) and are minimized for phase II (thanks to prefetching).
So, the total time for phase 2 is approx. one read plus one write of whole file, but one cylinderful at a time.
24. Prefetching/Double buffering Illustrated
25. Using B+ Trees for Sorting Scenario: Table to be sorted has B+ tree index on sorting column(s).
Idea: Can retrieve records in order by traversing leaf pages.
Is this a good idea?
Cases to consider:
B+ tree is clustered Good idea!
B+ tree is not clustered Could be a very bad idea!
26. Clustered B+ Tree Used for Sorting
Cost: root to the left-most leaf, then retrieve all leaf pages (Alternative 1)
If Alternative 2 is used? Additional cost of retrieving data records: each page fetched just once.
27. Unclustered B+ Tree Used for Sorting Alternative (2) for data entries; each data entry contains rid of a data record. In general, one I/O per data record!
28. Conclusion: Sorting Records! Sorting has become a blood sport!
Parallel sorting is the name of the game ...
Datamation: Sort 1M records of size 100 bytes
Typical DBMS: 15 minutes
World record: 3.5 seconds
12-CPU SGI machine, 96 disks, 2GB of RAM
These “records” beaten big time in 2010:
UCSD: Sort > 1 TB in 60 s! Sort 1 trillion records in 172 min.!
New benchmarks proposed:
Minute Sort: How many can you sort in 1 minute?
Dollar Sort: How many can you sort for $1.00?