8 external sorting
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

8. External Sorting PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on
  • Presentation posted in: General

8. External Sorting. Suppose that a file is so large that the whole file cannot be accommodated in the internal memory of a computer. What shall we do? Need to use EXTERNAL STORAGE DEVICE !!! External Sorting - Disk Sort - Tape Sort

Download Presentation

8. External Sorting

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


8 external sorting

8. External Sorting

Suppose that a file is so large that the whole file cannot be accommodated in the internal memory of a computer.

What shall we do?

Need to use EXTERNALSTORAGE DEVICE !!!

External Sorting

- Disk Sort

- Tape Sort

What is a major difference between two external sorts?


Sorting with disk

Sorting with Disk

k - way merging

“mergesort”

merge

internal sort

.

.

.

.

.

.

.

.

.

.

.

.


8 external sorting

Example

4500 records

250 records/block

available memory = 3 blocks

Def’n : A segment of a file is said to be a run if all the records in the segment are sorted.

123456

I

135

D1 ……

246

D2 ……


8 external sorting

3

D1D2

……

6n

D3D4

: the size of a run


8 external sorting

1 3 5 7

Run size 24 68

1 3 5 7 2 4 6 8

3

12345678

6

12563478

12

12345678

24

How many passes?

1 + log2r

(r # of initial runs)


8 external sorting

k-way merging

… … …… …

……

logkr……………………………………………….

……

# of passes

1+logkr

# of I/O operations?

O(nlogkr)

better than 2-way merging !!!


8 external sorting

How about # of comparisons?

Is k-way merging always better than 2-way merging?


Replacement selection

Replacement Selection

… … …… …

……

……………………………………………….

……

# of passes

1+logkr  #(P)

#(P) kr

rrun size 


Of comparisons k way merge

# of comparisons(k-way merge)

16 383025501611020

15 202025151112018

10 92015899017

10

9

20

15

8

9

90

17

8

9

10

11

12

13

14

15

9

15

8

17

4

5

6

7

9

8

2

3

1

8

8

8


8 external sorting

How many comparisons in a pass?

nlog2k why?

Total # of comparisons?

(# of passes) (# of comparisons in a pass)

= (logkr)(nlog2k)

= (nlog2r)independent of k !!!

#(c) r 


How to increase run size initial run size

How to increase run size(initial run size)

x1, x2, x3,…,xm, xm+1, xm+2, xm+3,…,x2m, x2m+1, x2m+2, x2m+3,…

m keysm keysm keys

r = # of runs =  Any improvement?

Observation

See p.94 in textbook

!!!

…...


8 external sorting

4

11

11

2

5

4,2,32,12,18,24,91,11

(record size >> the size of pointer)

why do we need this?

91

11

6

24

3

18

7

18


8 external sorting

A tree of losers

4parent

2loser

32

12Updating pointers

18ptr := winner.parent;

24while ptr  nil do

91 if (ptr.loser.key < winner.key) then

11interchange(ptr.loser, winner);

end {if}

ptr := ptr.parent;

end {while}

11

91

24

18

winner


8 external sorting

Explain p.97-101, textbook !!!

Exercise :

In a complete 2-tree(T) with n leaf nodes,

show that

total # of nodes in T = 2n -1


8 external sorting

Performance Analysis

(Average size of runs)

m0  # of records in (real) memory.

H. Seward (M.S. Thesis, MIT, 1954)

gave a good reason to believe that a run contains more than 1.5m0 records

(no proof)

E. Friend (JACM, 3, (1966))

experiment  2m0

E. Moore (1961)

Proved that 2m0 is the expected run length.


Sketch of moore s proof

Sketch of Moore’s Proof

Snowplow

falling snow

2m0 m0

uniform distribution  2m0


Tape sorting

Tape Sorting

  • Balanced k-way merging

    (similar to disk sorting)

  • Polyphase merging

  • Cascade merging


8 external sorting

Polyphase Merging (Motivation)

  • (R1, R2, …, R5000)

  • length (Ri)  20 bytes

  • Only 1000 records fitted in the internal memory at one time.

    ( 20k bytes)

  • 4 tapes available

    Balanced 2-way merge

    T1T2T3T4

    R1,1000R1001,2000

    R2001,3000R3001,4000  

    R4001,5000

     R1,2000R2001,4000

    R4001,5000

    R1,4000R4001,5000  

     R1,5000 

    Total # of operations = 15000


8 external sorting

Tape 1Tape 2Tape 3Tape 4

R1,1000R1001,2000R2001,3000 

R3001,4000R4001,5000

(rewind)

R3001,4000R4001,5000  R1,3000

R1,5000 

  • Total # of I/O operations

    3000 + 5000 = 8000

    Balanced Merge is not always best !!!


8 external sorting

What if only 3 tapes available?

Tape 1Tape 2Tape 3

R1,1000 R1001,2000

R2001,3000 R3001,4000 

R4001,5000

R1,2000

R2001,4000

R4001,5000

R1,2000 R2001,4000

R4001,5000

R1,4000

R4001,5000  

R4001,5000 R1,4000

R1,5000 

Total # of I/O Operations

5000 + 2000 + 5000 + 4000 + 5000 = 21,000 !!!


8 external sorting

Tape 1Tape 2Tape 3

R1,1000 R1001,2000

R2001,3000 R3001,4000 

R4001,5000

R1,2000

R4001,5000 R2001,4000

(rewind)

R1,2000; 4001,5000

(rewind)

R1,5000  

Total # of I/O Operations

4000 + 3000 + 5000 = 11,000 !!!


8 external sorting

Polyphase merge

T1T2T3T4T5T6

131130128124116 

11511411218 516

171614 9858

1312 1749454

11 3321729252

6513311719151

1291     

How to assign initial runs?


8 external sorting

Cascade Merge

T1T2T3T4T5T6

155150141129115

140135126114515

Pass 1126121112414515

11419312414515

1529312414515

(1529312414515)

155243749510

155144334556

Pass 21551441234253

1551441239251

(155144 1239251)

15414312291551

153142121501551

Pass 3 152141411501551

151291411501551

(151 291411501551)

Pass 41901


8 external sorting

Polyphase Merge

T1T2T3T4T5T6

phase 1131130128124116

211511411218516

31716149858

413121749454 Gilstad(1960)

5113321729252

66513311719151

71291

{{1,0,0,0,0},{1,1,1,1,1},{2,2,2,2,1},{4,4,4,3,2},{8,8,7,6,4},

{16,15,14,12,8},{31,30,28,24,16}}

Perfect Fibonacci Distribution !!!

What is the underlying rule?


8 external sorting

iaibicidiei

010000

111111

222221

344432

488764

5161514128

63130282416


8 external sorting

(a0 + b0) (a0 + c0) (a0 + d0) (a0 + e0) a0

(a1 + b1) (a1 + c1) (a1 + d1) (a1 + e1) a1

(a2 + b2) (a2 + c2) (a2 + d2) (a2 + e2) a2

nanbncndnen

n+1 an + bnan + cnan + dnan + enan

an  bn  cn dn en


8 external sorting

iaibicidieioutput

010000T6

111111T1

222221T2

344432T3

222102

111011

488764T4

5161514128T5

63130282416T6

76159554731

T1T2T3T4T5


8 external sorting

n-1an-1bn-1cn-1dn-1en-1

nan-1+bn-1an-1+cn-1an-1+dn-1an-1+en-1an-1

anbncndnen

en = an-1

dn = an-1 + en = an-1 + an-2

cn = an-1 + dn-1 = an-1 + (an-2 + en-2) = an-1 + an-2 + an-3

………….

en = an-1

dn = an-1 + an-2

cn = an-1 + an-2 + an-3

bn = an-1 + an-2 + an-3 + an-4

an = an-1 + an-2 + an-3 + an-4 + an-5

(a0 = 1, ai = 0, i = -1, -2, -3, -4)


8 external sorting

e = an-1

d = an-1 + an-2

c = an-1 + an-2 + an-3

b = an-1 + an-2 + an-3 + an-4

a = an-1 + an-2 + an-3 + an-4 + an-4


8 external sorting

i-4-3-2-101234567

ai000011248163161

1

bi0

ci0

di0

ei0


8 external sorting

1248163161

1248153059

1247142855

1236122447

112481631


8 external sorting

ai = < 0, 0, 0, 0, 1, 1, 2, 4, 8, 16, 31, 61, …… >, i = -4, -3, -2, -1, 0, 1, 2,...

“The kth order Fibonacci number”

Fnk = Fn-1k + Fn-2k + …… + Fn-kk

0, 0  nk-2

Fnk=

1, n = k-1

e.g)

The second order Fibonacci number

011235……

Fn2 = Fn-12 + Fn-22

0, if n = 0

Fn2 =

1, if n = 1

Fibonacci number !!!

an = Fn+k-1k if k tapes(input) are used

why?


8 external sorting

What if not perfect Fib. Dist’n?

Use dummy runs !!!

5 input tapes and 53 initial runs.

LevelT1T2T3T4T5

1111115

2222219

11110

34443217

22211

48876433

44332

516151412865>53

(87764)

………………………………

T1T2T3T4T5

(34)

(35)(36)(37)

(38)(39)(40)(41)

(42)(43)(44)(45)

(46)(47)(48)(49)(50)

(51)(52)(53) 

    

    


8 external sorting

T1T2T3T4T5T6

(2)(2)(2)(3)(3)

1817161458

(2)(2)(2)(3)55

53

not best

but simple and good !!!

For better one, see Knuth !!!


8 external sorting

Example (3 tapes)

T1T2T3

(k)8 (k)5

(k)3(2k)5

(3k)3 (2k)20, 1, 1, 2, 3, 5, 8

(5k)2(3k)1

(5k)1(8k)1

 (13k)1

Runs on two input tapes

(k)

# of runsrun size(k)# of pairs# of I/O’s

8,5 1,1 5 10

5,3 2,1 3 9

3,2 3,2 2 10

2,1 5,3 1 8

1,1 8,5 1 13

1 13

How many passes over the data?


8 external sorting

Total number Fs for some s.

of initial runs

the sth Fibonacci number

Fs

Fs-1 Fs-2

T1 T2 T3

Fs-1Fs-2

Fs-3Fs-2

Fs-3Fs-4

…………

See Fig. p.107, textbook !!!

Total # of I/O operations =

 # of passes =


8 external sorting

Lemma :

[proof] (By induction on S)

(s=2)LHS =

RHS =

(s=3)LHS =

RHS =

(s=k)Suppose that

(s=k+1)

Exercise !!!

See page 106-107 in textbook !!!


8 external sorting

From the previous lemma,

# of passes =

Fs = r

(1)

why?

. Golden Ratio !!!

From (1) ,


8 external sorting

Theorem:

Fs-1 Fs-2

Polyphase merge

merge 3 tapes

Fs = r = # of initial runs

# of passes = 1.04 log2r


8 external sorting

APPROXIMATED BEHAVIOR OF POLYPHASE MERGE SORTING

TapesPhasesPassesPass/phaseGrowth

percent ratio

32.078 lnS + 0.6721.504 lnS + 0.992 721.6180340

41.641 lnS + 0.3641.015 lnS + 0.965 621.8392868

51.524 lnS + 0.0780.863 lnS + 0.921 571.9275620

61.479 lnS + 0.1850.795 lnS + 0.864 541.9659482

71.460 lnS + 0.4240.762 lnS + 0.797 521.9835828

81.451 lnS + 0.6420.744 lnS + 0.723 511.9919642

91.447 lnS + 0.8380.734 lnS + 0.646 511.9960312

101.445 lnS + 1.0170.728 lnS + 0.568 501.9980295

201.443 lnS + 2.1700.721 lnS– 0.030 501.9999981

APPROXIMATED BEHAVIOR OF CASCADE MERGE SORTING

TapesPhasesPasses Growth

ratio

32.078 lnS + 0.6721.504 lnS + 0.9921.6180840

41.235 lnS + 0.7541.012 lnS + 0.8202.2469796

50.946 lnS + 0.7960.897 lnS + 0.8002.8793852

60.796 lnS + 0.8210.773 lnS + 0.8083.5133371

70.703 lnS + 0.8390.691 lnS + 0.8224.1481149

80.639 lnS + 0.8520.632 lnS + 0.8344.7833861

90.592 lnS + 0.8610.587 lnS + 0.8455.4189757

100.555 lnS + 0.8690.552 lnS + 0.8546.0547828

200.397 lnS + 0.9050.397 lnS + 0.90112.4174426


Cascade merge

Cascade Merge

Levelaibicidiei

010000

111111

254321

315141295

45550412915

nanbncndnen

n+1 an+bn+cnan+1bn+1cn+1dn+1

+dn+en-en-dn-cn-bn

an+1an

Perfect dist’n

for detail see Knuth Vol III !!!


  • Login