560 likes | 575 Views
Study advanced data structures, algorithm design, analysis, and implementation techniques. Gain practical skills in efficient algorithm implementation on modern computers.
E N D
G64ADSAdvanced Data Structures Guoping Qiu Room C34, CS Building
About this course • Study advanced data structures, algorithm design, analysis and implementation techniques • To obtain advanced knowledge and practical skills in the efficient implementation of algorithms on modern computers.
About this course • Pre-requisites • Good programming experience (this course does not teach programming) • Java, C++, or C
About this course • Timetable • Lecture Friday, 10:00 - 12:00, C60 • Labs Tuesday 9.00-11.00, B52
About this course • Assessments • Final Exam 50% • Continuous assessment 50% • Homework assignments • Programming projects
About this course • References and learning materials • Textbooks • Mark Allen Weiss, Data Stuctures and Algorithm Analysis in Java, 2nd Edition, Addison Wesley, 2007 • Mark Allen Weiss, Data Stuctures and Problem Solving using Java, 3rd Edition, Addison Wesley, 2006 • Robert Sedgewick, Algorithms in C, 3rd Edition, Addison Wesley, 1998 • Robert Sedgewick, Algorithms in C++, 3rd Edition, Addison Wesley, 1998 • Robert Sedgewick, Algorithms in Java, 3rd Edition, Addison Wesley, 2003 • Course web page http://www.cs.nott.ac.uk/~qiu/Teaching/G64ADS Slides, coursework, other materials
Course Overview • Advanced data structures • Trees, hash tables, heaps, disjoint sets, graphs • Algorithm development and analysis • Insert, delete, search, sort • Applications • Implementation in Java, C++, C
Advanced Data Structures • “Why not just use a big array?” • Example problem Search for a number k in a set of N numbers • Solution Store numbers in an array of size N Iterate through array until find k Number of checks Best case: 1 (k=29) Worst case: N (k=25) Average case: N/2 29 20 65 80 21 39 25
Advanced Data Structures • Solution #2 • Store numbers in a binary search tree • Search tree until find k • Number of checks Best case: 1 (k=29) Worst case: log2N (k=25) Average case: (log2N) / 2 29 20 65 21 80 25 39
Analysis • Does it matter? N vs. (log2N)
Analysis • Does it matter? Assume : • N = 1,000,000,000 • 1 billion (Walmart transactions in 100 days) • 1 Ghz processor = 109 cycles per second Solution #1 (10 cycles per check) • Worst case: 1 billion checks = 10 seconds Solution #2 (100 cycles per check) • Worst case: 30 checks = 0.000003 seconds
Advanced Data Structures • Does it matter? • The Message • Appropriate data structures ease design and improve performance • The Challenge • Design appropriate data structure and associated algorithms for a problem • Analyze to show improved performance
Purpose • Why bother analyzing code; isn’t getting it to work enough? • Estimate time and memory in the average case and worst case • Identify bottlenecks, i.e., where to reduce time • Speed up critical algorithms
Algorithm • Problem • Specifies the desired input-output relationship • Algorithm • Well-defined computational procedure for transforming inputs to outputs • Correct algorithm • Produces the correct output for every possible input in finite time • Solves the problem
Algorithm Analysis • Predict resource utilization of an algorithm • Running time • Memory • Dependent on architecture • Serial • Parallel • Quantum
What to Analyze • Main focus is on running time • Memory/time tradeoff • Simple serial computing model • Single processor, infinite memory
What to Analyze • Running time T(N) • N is typically the size of the input • Sorting? • Multiplying two integers? • Multiplying two matrices? • Traversing a graph? • T(N) measures number of primitive operations performed e.g., addition, multiplication, comparison, assignment
Example • General Rules • Rule 1 – for loop • The running time of a for loop is at most the running time of the statements inside the for loop times the number of iterations
Example • General Rules • Rule 2 –nested loop for (i = 0; i<n;i++) for(j=0; j<n; j++) k++ This fragment is O(N2)
Example • General Rules • Rule 3 – Consecutive statements for (i = 0; i<n;i++) a[i]=0; for (i = 0; i<n;i++) for(j=0; j<n; j++) a[i]+=[a[j]+i+j This fragment is O(N) work followed by O(N2) work -> O(N2)
Example • General Rules • Rule 4 – if/else If (condition) S1 else S2 No more than the running time of the test plus max {S1,S2)
What to Analyze • Worst-case running time Tworst(N) • Average-case running time Tavg(N) • Tavg(N) <= Tworst(N) • Typically analyze worst-case behavior • Average case hard to compute • Worst-case gives guaranteed upper bound
Rate of Growth • Exact expressions for T(N) meaningless and hard to compare • Rate of growth • Asymptotic behavior of T(N) as N gets big • Usually expressed as fastest growing term in T(N), dropping constant coefficients e.g., T(N) = 3N2+ N + 1 → Θ(N2)
Rate of Growth • T(N) = O(f(N)) if there are positive constants c and n0 such that T(N) ≤cf(N) when N ≥n0 • Asymptotic upper bound • “Big-Oh” notation • T(N) = Ω(g(N)) if there are positive constants c and n0 such that T(N) ≥cg(N) when N ≥ n0 • Asymptotic lower bound • “Big-Omega” notation
Rate of Growth • T(N) = Θ(h(N)) if and only if T(N) = O(h(N)) and T(N) = Ω(h(N)) • Asymptotic tight bound • T(N) = o(p(N)) if for all constants c there exists an n0 such that T(N) < cp(N) when N>n0 • i.e., T(N) = o(p(N)) if T(N) = O(p(N)) and T(N) ≠Θ(p(N)) • “Little-oh” notation
Rate of Growth • N2= O(N2) = O(N3) = O(2N) • N2= Ω(1) = Ω(N) = Ω(N2) • N2= Θ(N2) • N2= o(N3) • 2N2+ 1 = Θ(?) • N2+ N = Θ(?)
Rate of Growth • Rule 1: If T1(N) = O(f(N)) and T2(N) = O(g(N)), then • T1(N) + T2(N) = O(f(N) + g(N)) • T1(N) * T2(N) = O(f(N) * g(N)) • Rule 2: If T(N) is a polynomial of degree k, then T(N) = Θ(Nk) • Rule 3: logkN = O(N) for any constant k
Example • Maximum Subsequence Sum problem • Given (possibly negative) integers, A1, A2, …, AN, find the maximum value of • e.g, for input -2, 11, -4, 13, -5, -2, the answer is 20 (A2 through A4)
Example • Maximum Subsequence Sum problem • Given (possibly negative) integers, A1, A2, …, AN, find the maximum value of • e.g, for input -2, 11, -4, 13, -5, -2, the answer is 20 (A2 through A4)
Example • Algorithm 1 • Compute each possible subsequence independently MaxSubSum1 (A) maxSum = 0 for i = 1 to N for j = i to N sum = 0 for k = i to j sum = sum + A[k] if (sum > maxSum) then maxSum = sum return maxSum
Example • Algorithm 1 • /* • Cubic maximum contiguous subsequence sum algorithm. • */ • public static int maxSubSum1( int [ ] a ) • { • int maxSum = 0; • for( int i = 0; i < a.length; i++ ) • for( int j = i; j < a.length; j++ ) • { • int thisSum = 0; • for( int k = i; k <= j; k++ ) • thisSum += a[ k ]; • if( thisSum > maxSum ) • maxSum = thisSum; • } • return maxSum; • }
Example • Algorithm 1: Analysis
Example • Algorithm 2 • Note that • No reason to re-compute sum each time MaxSubSum2 (A) maxSum = 0 for i = 1 to N sum = 0 for j = i to N sum = sum + A[j] if (sum > maxSum) then maxSum = sum return maxSum
Example Algorithm 2 • /** • * Quadratic maximum contiguous subsequence sum algorithm. • */ • public static int maxSubSum2( int [ ] a ) • { • int maxSum = 0; • for( int i = 0; i < a.length; i++ ) • { • int thisSum = 0; • for( int j = i; j < a.length; j++ ) • { • thisSum += a[ j ]; • if( thisSum > maxSum ) • maxSum = thisSum; • } • } • return maxSum; • }
Example • Algorithm 2: Analysis
Example • Algorithm 3 • Recursive, divide and conquer • Divide sequence in half • A(1 ... center) and A(center+1 ... N) • Recursively compute MaxSubSum of left half • Recursively compute MaxSubSum of right half • Compute MaxSubSum of sequence constrained to use A(center) and A(center+1) • e.g., <4, -3, 5, -2, -1, 2, 6, -2>
Example • Algorithm 3 MaxSubSum3 (A, i, j) maxSum = 0 if (i = j) then if A[i] > 0 then maxSum = A[i] else k = floor((i+j)/2) maxSumLeft = MaxSubSum3(A,i,k) maxSumRight = MaxSubSum3(A,k+1,j) // compute maxSumThruCenter maxSum = maximum(maxSumLeft,maxSumRight,maxSumThruCenter) return maxSum
Example • Algorithm 3 • /** • * Recursive maximum contiguous subsequence sum algorithm. • * Finds maximum sum in subarray spanning a[left..right]. • * Does not attempt to maintain actual best sequence. • */ • private static int maxSumRec( int [ ] a, int left, int right ) • { • if( left == right ) // Base case • if( a[ left ] > 0 ) • return a[ left ]; • else • return 0;
Example • int center = ( left + right ) / 2; • int maxLeftSum = maxSumRec( a, left, center ); • int maxRightSum = maxSumRec( a, center + 1, right ); • int maxLeftBorderSum = 0, leftBorderSum = 0; • for( int i = center; i >= left; i-- ) • { • leftBorderSum += a[ i ]; • if( leftBorderSum > maxLeftBorderSum ) • maxLeftBorderSum = leftBorderSum; • } • int maxRightBorderSum = 0, rightBorderSum = 0; • for( int i = center + 1; i <= right; i++ ) • { • rightBorderSum += a[ i ]; • if( rightBorderSum > maxRightBorderSum ) • maxRightBorderSum = rightBorderSum; • } • return max3( maxLeftSum, maxRightSum, • maxLeftBorderSum + maxRightBorderSum ); • }
Example • Algorithm 3: Analysis
Example • Algorithm 4 • Observation • Any negative subsequence cannot be a prefix to the maximum sequence • Or, a positive, contiguous subsequence is always worth adding • T(N) = ? MaxSubSum4 (A) maxSum = 0 sum = 0 for j = 1 to N sum = sum + A[j] if (sum > maxSum) then maxSum = sum else if (sum < 0) then sum = 0 return maxSum
Example • Algorithm 4 • / * Linear-time maximum contiguous subsequence sum algorithm*/ • public static int maxSubSum4( int [ ] a ) • { • int maxSum = 0, thisSum = 0; • for( int j = 0; j < a.length; j++ ) • { • thisSum += a[ j ]; • if( thisSum > maxSum ) • maxSum = thisSum; • else if( thisSum < 0 ) • thisSum = 0; • } • return maxSum; • }
Logarithmic Behaviour • T(N) = O(log2 N), usually occurs when • Problem can be halved in constant time • Solutions to sub-problems combined in constant time • Examples • Binary search • Euclid’s algorithm • Exponentiation
Binary Search • Given an integer X and integers A0,A1,…,AN-1, which are presorted and already in memory, find i such that Ai=X, or return i = -1 if X is not in the input • T(N) = O(log2 N) • T(N) = Θ(log2 N) ?