2il50 data structures
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

2IL50 Data Structures PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on
  • Presentation posted in: General

2IL50 Data Structures. Spring 2014 Lecture 4: Sorting in linear time. Building a heap. One more time …. Building a heap. Build-Max-Heap (A) heap-size = A.length for i = A.length downto 1 do Max-Heapify(A,i). 14. 3. 8. 11. Building a heap. Build-Max-Heap2 (A) heap-size[A] = 1

Download Presentation

2IL50 Data Structures

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


2il50 data structures

2IL50 Data Structures

Spring 2014Lecture 4: Sorting in linear time


Building a heap

Building a heap

One more time …


Building a heap1

Building a heap

Build-Max-Heap(A)

  • heap-size = A.length

  • for i = A.length downto 1

  • do Max-Heapify(A,i)


Building a heap2

14

3

8

11

Building a heap

Build-Max-Heap2(A)

  • heap-size[A] = 1

  • for i = 2 to A.length doMax-Heap-Insert(A, A[i])

    Max-Heap-Insert(A, key)

  • heap-size[A] = heap-size[A] + 1

  • A[heap-size[A] ] = - infinity

  • Increase-Key(A, heap-size[A], key)

    Lower bound worst case running time?

14

3

8

11

2

24

35

28

16

5

20

21

A[i] moves up until it reaches the correct position


Building a heap3

Building a heap

Build-Max-Heap2(A)

  • heap-size[A] = 1

  • for i = 2 to A.length doMax-Heap-Insert(A, A[i])

    Running time: Θ(1) + ∑2≤i ≤ n (time for Max-Heap-Insert(A, A[i]))

  • Max-Heap-Insert (A, A[i] ) takes O(log i) = O(log n) time

    ➨ worst case running time is O(n log n)

  • If A is sorted in increasing order, then A[i] is always the largest element when Max-Heap-Insert(A, A[i])) is called and must move all the way up the tree

    ➨ Max-Heap-Insert (A, A[i]) takes Ω(log i) time.

    Worst case running time: Θ(1) + ∑2≤i ≤ nΩ(log i) = Ω(1 + ∑2≤i ≤ n log i)

    = Ω (n log n)

    since ∑2≤i ≤ n log i ≥ ∑n/2≤i ≤ n log (n/2) = n/2 log (n/2)


2il50 data structures

log2 n = Θ(log n2) ?

√n = Ω(log4 n) ?

2lg n = Ω(n2) ?

2n = Ω(n2) ?

log(√n) = Θ(log n) ?

no

yes

no

yes

yes

Quiz


Sorting in linear time

Sorting in linear time


The sorting problem

The sorting problem

Input: a sequence of n numbers ‹a1, a2, …, an›

Output: a permutation of the input such that ‹ai1≤ … ≤ain›

Why do we care so much about sorting?

  • sorting is used by many applications

  • (first) step of many algorithms

  • many techniques can be illustrated by studying sorting


Can we sort faster than n log n

Can we sort faster than Θ(n log n) ??

Worst case running time of sorting algorithms:

InsertionSort: Θ(n2)

MergeSort: Θ(n log n)

HeapSort: Θ(n log n)

Can we do this faster? Θ(n loglog n) ? Θ(n) ?


Upper and lower bounds

Upper and lower bounds

Upper bound

How do you show that a problem (for example sorting) can be solved in Θ(f(n)) time?

➨ give an algorithm that solves the problem in Θ(f(n)) time.

Lower bound

How do you show that a problem (for example sorting) cannot be solved faster than in Θ(f(n)) time?

➨ prove that every possiblealgorithm that solves the problem needs Ω(f(n)) time.


Lower bounds

Lower bounds

Lower bound

How do you show that a problem (for example sorting) can not be solved faster than in Θ(f(n)) time?

➨ prove that every possible algorithm that solves the problem needs Ω(f(n)) time.

Model of computation: which operations is the algorithm allowed to use?

Bit-manipulations?Random-access (array indexing) vs. pointer-machines?


Comparison based sorting

Comparison-based sorting

InsertionSort(A)

  • for j = 2 to A.length

  • do begin

    6.

    7. end

    Which steps precisely the algorithm executes — and hence, which element ends up where — only depends on the result of comparisons between the input elements.

key = A[ j ] ; i = j -1

while i > 0 and A[ i ] > key

do begin A[ i+1] = A[ i ]; i = i -1 end

A[ i +1] = key


Decision tree for comparison based sorting

or ≤, =, >, ≥

Decision tree for comparison-based sorting

exchange of elements, assignments, etc. …

A[.] < A[.]

A[.] < A[.]

A[.] < A[.]

A[.] < A[.]

A[.] < A[.]

A[.] < A[.]

A[.] < A[.]


Comparison based sorting1

Comparison-based sorting

  • every permutation of the input follows a different path in the decision tree

    ➨ the decision tree has at least n! leaves

  • the height of a binary tree with n!leaves is at least log(n!)

  • worst case running time

    ≥ longest path from root to leaf

    ≥ log(n!) = Ω(n log n)


Lower bound for comparison based sorting

Lower bound for comparison-based sorting

TheoremAny comparison-based sorting algorithm requires Ω(n log n) comparisons in the worst case.

➨ The worst case running time of MergeSort and HeapSort is optimal.


Sorting in linear time1

Sorting in linear time …

Three algorithms which are faster:

  • CountingSort

  • RadixSort

  • BucketSort

    (not comparison-based, make assumptions on the input)


Countingsort

CountingSort

Input: array A[1..n] of numbers

Assumption: the input elements are integers in the range0 to k, for some k

Main idea:count for every A[i] the number of elements less than A[i]➨ position of A[i] in the output array

Beware of elements that have the same value!

position(i) = number of elements less than A[i] in A[1..n]

+ number of elements equal to A[i] in A[1..i]


Countingsort1

CountingSort

position(i) = number of elements less than A[i] in A[1..n]

+ number of elements equal to A[i] in A[1..i]

5

3

10

5

4

5

7

7

9

3

10

8

5

3

3

8

3

3

3

3

4

5

5

5

5

7

7

8

8

9

10

10

numbers < 5

third 5 from left position: (# less than 5) + 3


Countingsort2

CountingSort

position(i) = number of elements less than A[i] in A[1..n]

+ number of elements equal to A[i] in A[1..i]

LemmaIf every element A[i] is placed on position(i), then the array is sorted and the sorted order is stable.

Numbers with the same value appear in the same order in the output array as they do in the input array.


Countingsort3

CountingSort

C[i] will contain the number of elements ≤ i

CountingSort(A,k)

►Input: array A[1..n] of integers in the range 0..k

►Output: array B[1..n] which contains the elements of A, sorted

  • for i = 0 to k do C[i] = 0

  • for j = 1 to A.length do C[A[j]] = C[A[j]] + 1

  • ►C[i] now contains the number of elements equal to i

  • for i = 1 to k do C[i] = C[i ] + C[i-1]

  • ►C[i] now contains the number of elements less than or equal to i

  • for j = A.length downto 1

  • do B[C[A[ j ] ] ] = A[j]; C[A[ j ]] = C[A[ j ]] – 1


Countingsort4

CountingSort

CountingSort(A,k)

►Input: array A[1..n] of integers in the range 0..k

►Output: array B[1..n] which contains the elements of A, sorted

  • for i = 0 to k do C[i] = 0

  • for j = 1 to A.length do C[A[j]] = C[A[j]] + 1

  • ►C[i] now contains the number of elements equal to i

  • for i = 1 to k do C[i] = C[i ] + C[i-1]

  • ►C[i] now contains the number of elements less than or equal to i

  • for j = A.length downto 1

  • do B[C[A[ j ] ] ] = A[j]; C[A[ j ]] = C[A[ j ]] – 1

    Correctness lines 6/7: Invariant

    Inv(m): for m ≤ i ≤ n: B[position(i)] contains A[i]

    for 0 ≤ i ≤ k: C[i] = ( # numbers smaller than i )

    + ( # numbers equal to i in A[1..m-1])

    Inv(m+1) holds before loop is executed with j =m, Inv(m) holds afterwards


Countingsort running time

CountingSort: running time

CountingSort(A,k)

►Input: array A[1..n] of integers in the range 0..k

►Output: array B[1..n] which contains the elements of A, sorted

  • for i = 0 to k do C[i] = 0

  • for j = 1 to A.length do C[A[j]] = C[A[j]] + 1

  • ►C[i] now contains the number of elements equal to i

  • for i = 1 to k do C[i] = C[i ] + C[i-1]

  • ►C[i] now contains the number of elements less than or equal to i

  • for j = A.length downto 1

  • do B[C[A[ j ] ] ] = A[j]; C[A[ j ]] = C[A[ j ]] – 1

    line 1: ∑0≤i≤kΘ(1) = Θ(k)

    line 2: ∑1≤i≤nΘ(1) = Θ(n)

    line 4: ∑0≤i≤kΘ(1) = Θ(k)

    lines 6/7: ∑1≤i≤nΘ(1) = Θ(n)

Total: Θ(n+k) ➨ Θ(n) if k = O(n)


Countingsort5

CountingSort

TheoremCountingSort is a stable sorting algorithm that sorts an array of n integers in the range 0..k in Θ(n+k) time.


Radixsort

RadixSort

Input: array A[1..n] of numbers

Assumption: the input elements are integers with ddigits

example(d = 4): 3288, 1193, 9999, 0654, 7243, 4321

RadixSort(A, d)

  • for i = 1 to d

  • do use a stable sort to sort array A on digit i

dth digit

1st digit


Radixsort example

720

355

436

457

657

329

839

720

329

436

839

355

457

657

329

355

436

457

657

720

839

RadixSort: example

Correctness: Assignment 3

329

457

657

839

436

720

355

sort on 1st digit

sort on 2nd digit

sort on 3rd digit


Radixsort1

RadixSort

Running time: If we use CountingSort as stable sorting algorithm

➨ Θ(n + k) per digit

TheoremGiven nd-digit numbers in which each digit can take up to k possible values, RadixSort correctly sorts these numbers in Θ(d (n + k)) time.

each digit is an integer in the range 0..k


Bucketsort

BucketSort

Input: array A[1..n] of numbers

Assumption: the input elements lie in the interval [0..1) (no integers!)

BucketSort is fast if the elements are uniformly distributed in [0..1)


Bucketsort1

1

0

0.792

2

1

0.1

0.1

0.15

0.13

0.287

0.287

0.256

0.15

0.346

0.346

0.734

0.5

0.53

0.5

0.13

0.792

0.734

0.256

n-1

n

0.53

BucketSort

  • Throw input elements in “buckets”, sort buckets, concatenate …

input array A[1..n]; numbers in [0..1)

auxiliary array B[0..n-1]

bucket B[i] contains numbers in [i/n … (i+1)/n]


Bucketsort2

1

0.792

2

0.1

0.1

0.13

0.15

0.287

0.256

0.287

0.15

0.346

0.346

0.346

0.734

0.5

0.5

0.53

0.53

0.5

0.13

0.734

0.792

0.256

n

0.53

BucketSort

  • Throw input elements in “buckets”, sort buckets, concatenate …

input array A[1..n]; numbers in [0..1)

auxiliary array B[0..n-1]

bucket B[i] contains numbers in [i/n … (i+1)/n]

0

1

0.1

0.15

0.13

0.287

0.256

0.792

0.734

n-1


Bucketsort3

BucketSort

BucketSort(A)

► Input: array A[1..n] of numbers with 0 ≤ A[i ] < 1

► Output: sorted list, which contains the elements of A

  • n = A.length

  • initialize auxiliary array B[0..n-1]; each B[i] is a linked list of numbers

  • for i = 1 to n

  • do insert A[i] into list B[ n∙A[i ] ]

  • for i = 0 to n-1

  • do sort list B[i], for example with InsertionSort

  • concatenate the lists B[0], B[1], …, B[n-1] together in order


Bucketsort4

BucketSort

Running time?

Define ni = number of elements in bucket B[i]

➨ running time = Θ(n) + ∑0≤i≤n-1Θ(ni2)

  • worst case:

  • best case:

  • expected running time if the numbers are randomly distributed ?

all numbers fall into the same bucket ➨ Θ(n2)

all numbers fall into different buckets ➨Θ(n)


Bucketsort expected running time

BucketSort: expected running time

Define ni = number of elements in bucket B[i]

➨ running time = Θ(n) + ∑0≤i≤n-1Θ(ni2)

Assumption: Pr { A[j] falls in bucket B[i] } = 1/n for each i

E [ running time ] = E [Θ(n) + ∑0≤i≤n-1Θ(ni2) ]

= Θ ( n + ∑0≤i≤n-1 E [ni2 ] )

What is E [ni2 ] ?

(some math with indicator random variables – see book for details)

➨ E [ni2 ] = 2 - 1/n = Θ(1)

➨ expected running time = Θ(n)

but E [ni2 ] ≠ E [ni ]2

We have E [ni] = 1 …


  • Login