1 / 6

Lecture 6 : External Sorting - PowerPoint PPT Presentation

Lecture 6 : External Sorting. Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University. External Sorting. Sorting algorithm that can handle massive amounts of data (using external memory) Required when data does not fit into main memory

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about ' Lecture 6 : External Sorting' - efuru

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Lecture 6 : External Sorting

Bong-Soo Sohn

Assistant Professor

School of Computer Science and Engineering

Chung-Ang University

• Sorting algorithm that can handle massive amounts of data (using external memory)

• Required when data does not fit into main memory

• out-of-core algorithm vs in-core algorithm

• Sometimes the data to sort are too large to fit in memory (Why not virtual memory?)

• Use external memory (disk)

• Disk performance

• seek time (major factor)

• rotational latency

• Transfer

• Primary rule for disk access

• Minimize the number of disk accesses

• Assume external(secondary) memory is divided into equal sized blocks (ex. 1KB, 4KB, …)

• Block : unit where data is stored and retrived

• EX) sorting 900MB of data using only 100MB of RAM:

• Read 100 MB of the data in main memory and sort by some conventional method (usually quicksort).

• Write the sorted data to disk.

• Repeat steps 1 and 2 until all of the data is sorted in 100 MB chunks, which now need to be merged into one single output file.

• Read the first 10 MB of each sorted chunk (call them input buffers) in main memory (90 MB total) and allocate the remaining 10 MB for output buffer.

• Perform a 9-way merging and store the result in the output buffer. If the output buffer is full, write it to the final sorted file. If any of the 9 input buffers gets empty, fill it with the next 10 MB of its associated 100 MB sorted chunk or otherwise mark it as exhausted if there is no more data in the sorted chunk and do not use it for merging.

R2

R3

R4

R5

R6

R7

R8

R9

R10

R11

R12

R13

R14

R15

R16

R17

R18

R19

R20

S10

S1

S2

S3

S4

S5

S6

S7

S8

S9

T4

T5

T1

T2

T3

U3

U2

U1

V2

V1

2-way merge sort

• # of passes : 5

W1

R2

R3

R4

R5

R6

R7

R8

R9

R10

R11

R12

R13

R14

R15

R16

R17

R18

R19

R20

S1

S2

S3

S4

5-way merge sort

T1

• we can reduce # of passes