Sorting really big files
This presentation is the property of its rightful owner.
Sponsored Links
1 / 6

Sorting Really Big Files PowerPoint PPT Presentation


  • 41 Views
  • Uploaded on
  • Presentation posted in: General

Sorting Really Big Files. Sorting Part 3. Using K Temporary Files. Given N records in file F M records will fit into internal memory Use K temp files, where K = N / M Create K sorted files from F, then merge them Problems computers compare 2 values at once, not K values

Download Presentation

Sorting Really Big Files

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sorting really big files

Sorting ReallyBig Files

Sorting Part 3


Using k temporary files

Using K Temporary Files

  • Given

    • N records in file F

    • M records will fit into internal memory

    • Use K temp files, where K = N / M

  • Create K sorted files from F, then merge them

  • Problems

    • computers compare 2 values at once, not K values

    • merging only 2 of K runs at once creates LOTS of temp files

    • in the illustration on the next page, notice that we soon begin merging small runs with big temp files

      • too many comparisons


Alternative merging strategy

What would these trees look like with 8 runs?

Alternative Merging Strategy

F

F

R1

R2

R1

R2

R3

R4

R5

empty

T1

R3

T1

T2

T3

T2

R4

S1

S2

T3

R5

R1 = Run 1

R2 = Run 2

etc


N way merge

N-Way Merge

  • We can create that tree using just 4 temp files

    • 2 are input and 2 are output, the pairs alternate being input and output files

  • Algorithm

    Write Run 1 into T1

    Write Run 2 into T2

    Write Run 3 into T1

    Write Run 4 into T2

    ...

    Merge first runs in T1 and T2 into T3

    Merge second runs in T1 and T2 into T4

    Merge thirds runs in T1 and T2 into T3

    ...

    Merge first runs in T3 and T4 into T1

    Merge second runs in T3 and T4 into T2

    ...


N way merge1

N-Way Merge

F

T1

T2

T3

T4

T1

T2

T3

T4


Analysis

Analysis

  • Number of Comparisons:

    • N-Way Merge -- O (n log2 n)

    • K Temp Files -- O ( n2 )

  • Disk Space

  • Could the run size be one record?

    • In other words, is the internal sort necessary?


  • Login