Foundations of Data Structures

Foundations of Data Structures Practical Session #12 Linear Sorting

Sorting algorithms criteria • Running time (worst/average/best cases). • Additional memory required. • “In place” algorithms require additional memory. • Stability- stable sorting algorithms maintain the relative order of records with equal keys.

Comparison sorting • A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (i.e., ) that determines which of two elements should occur first in the final sorted list. • Simple algorithms. For example: insertion sort, selection sort, bubble sort - . • Recursive algorithms. For example: merge sort, quicksort - . • A comparison sort must have a lower bound of comparison operations. This is a consequence of the limited information available through comparisons alone.

Linear sorting • Linear sorting is the problem of sorting a collection of items with numerickeys. • The ability to perform arithmetic operations on the keys allows faster algorithms thancomparison-based algorithms in many cases. • Classical examples include: • Counting sort • Radix sort • Bucket sort

Counting sort • A sort algorithm that is not based on comparisons and supports duplicate keys. • is an input array of length . • is the output array. • is an auxiliary array of size . • Assumption: consists of elements with integer keys in the range .

Counting sort cont’d Counting-Sort (A, B, k) Example for i ← 1 to k C[i] ← 0 // Calc histogram for j ← 1 to n C[A[j]] ← C[A[j]] + 1 // Calc start index (backwards) in output for each key for i ← 2 to k C[i] ← C[i] + C[i-1] // Copy to output array for j ← n downto 1 B[C[A[j]]] ← A[j] C[A[j]] ← C[A[j]] – 1 return B

Counting sort analysis • Run time complexity: .This is an improvement over comparison-based sorts, which require time. • Counting sort is stable-two elements with the same key value will appear in the output in the same order as they appeared in the input. Stability is important when there is additional data besides the key. • Additional space required:

Radix sort • A stable sort algorithm for sorting elements with digits, where the digits are in base , i.e., in range . • The algorithm uses a stable sort algorithm to sort the keys by each digit, starting with the least significant digit (the rightmost). • Radix-Sort(A[1..n], d) • for i ← 1 to d Use a stable sort to sort A • according to digit i

Radix sort analysis • For example, sort 7 numbers with 3 digits in decimal base. Input array Sorted by 1st digit Sorted by 2nd (and 1st ) digit Sorted!

Radix sort cont’d • Assuming the stable sort runs in (such as counting sort) the running time is . • If is constant and , the running time is O(n).

Bucket sort • Assumption: the elements of the input array are uniformly distributed over the interval *. • * Can be extended to keys in any range • The idea • Divide the interval into equal-sized subintervals (buckets). • Distribute the input numbers between the buckets. • To produce the output, sort the elements in each bucket and then output the concatenation of the buckets.

Bucket sort cont’d Bucket-sort (A) n ← length(A) for i ← 1 to n do for i ← 1 to n do sort B[i] using insertion sort return B[0]^B[1]^… B[n-1] Example

Bucket sort analysis • Assuming the input is uniformly distributed over , the expected size of each bucket is • Thus, sorting each bucket requires expected time. • Distributing elements between buckets is and concatenating the buckets (lists) is Total expected run time is .

Sort algorithms review

Question 1 • Describe an algorithm, that given integers in the range , performs a preprocessing in time to answer in time “how many of the numbers are in the range ?” for some parameters?

Question 1 solution • Counting-sort gets an array of integers in the range and performs the following lines: • for (j = 0; j < length[A]-1; j++) • C[A[j]] = C[A[j]]+1; • for(i = 1; i <= k; i++) • C[i] = C[i] + C[i-1]; • After which C[i] stores the number of items smaller or equal to (requires time). • Given a range to answer the query simply return C[b] – C[a] in time as required.

Question 2 • Design an algorithm for sorting elements with keys in the range that runs in time if the items are uniformly distributed over , and in in the worst distribution case. • Hint: what’s the worst case scenario of bucket sort, and why it results in running time?

Question 2 solution • Use bucket sort over the range with the following changes: • The elements in each bucket are stored in an AVL tree (instead of a linked list). • In the last stage, concatenate all the inorder visits of all the buckets one after another.

Question 2 solution cont’d • Let be the number of elements in bucket (an AVL tree). • Inserting the elements into the buckets takes • When the keys are uniformly distributed, for every, resulting in running time. • In the worst distribution case there’s an s.t. , therefore • Inorder traversals of all the buckets takes time, and its concatenation takes as well. • In total: time for uniformly distributed keys and in the worst distribution case.

Question 2 solution cont’d • Alternative solution • Execute in parallel the following two algorithms: • Original bucket sort. • Any sort algorithm that takes . • Stop when either of the algorithms has stopped (the quickest) and return its output.

Question 3 • Given a set of integers in the range , suggest an efficient sorting algorithm. • Naïve attempts • Comparison-based algorithm take . • Counting-sort: . • Bucket sort: uniform distribution isn’t given. • Radix sort: • Can we do any better?

Question 3 solution • Use radix-sort after preprocessing: • Convert all the numbers to base in total time: Each input number is converted to where • Call radix-sort on the transformed numbers . • All the numbers are in range 1 to , therefore, there are at most 4 digits for each number( in base is ). • The running time of the suggested algorithm is .

Question 4 • There are sets . • Each set contains integers in the range and • is the number of elements in . • Suggest an algorithm for sorting all the sets in time complexity and space (memory) complexity. • Note: The output is sorted sets and nota single merged sorted set.

Question 4 solution • Solution: • If we sort each set in separate using an algorithm (such as merge-sort) we get: • If we sort each set in separate using counting sort: Furthermore, the space complexity is .

Question 4 solution cont’d • The following algorithm runs in time and space complexity: • Add a field set-numto each element – . • Build an array of all the elements in all of the sets –. • Sort using counting-sort –. • Split back into the original sets according to the set-numfield –

Foundations of Data Structures