CSC 332 – Algorithms and Data Structures

CSC 332 – Algorithms and Data Structures Analysis Dr. Paige H. Meeker Computer Science Presbyterian College, Clinton, SC

Why do we care? Every 18-24 months, manufacturers are introducing faster machines with larger memories. Why do we need to write efficient code?

Well, just suppose… • Imagine we are defining a Java class “Huge” to represent long integers. • We want a method for our class that • adds two large integers – two instances of our class • multiplies two large integers. Suppose we have successfully implemented the add() method and are moving on to multiply().

Well, just suppose… Suppose we have successfully implemented the add() method and are moving on to multiply(). Object oriented coding is all about using what you already have, right? So, since multiplication is equivalent to repeated addition, to compute the product 7562 * 423 we could initialize a variable to 0 and then add 7562 to it 463 times – why not, add() works, right?

Efficiency Example public class BigOEx { public static void main (String [] args) { long firstOp = 7562; long secondOp = 463 ; long product = 0; for (long i=secondOp; i>0; i--) product += firstOp; System.out.println("Product of "+firstOp+" and "+secondOp+" = "+product); } }

Efficiency Example • If we run the previous code, we should get a result in a reasonable amount of time. So, let’s run it but replace the 463 with 100000000 (8 0’s). Will we still get a result in a reasonable time? How about with 1000000000. (9 0’s) Is something wrong?

Efficiency Example Why a code delay? Can we do better?

Efficiency Example • How might we rethink our code to produce a more efficient result?

Efficiency Example Consider the product of the original numbers (463 * 7562). The secondOp has 3 digits – a 100’s digit, a 10’s digit, and a 1’s digit. (463 = 400 + 60 + 3) So: 7562 * 463 = 7562 * (400 + 60 + 3) = 7562 * 400 + 7562 * 60 + 7562 * 3 = 756200 * 4 + 75620 * 6 + 7562*3 = 756200+756200+756200+756200+75620= 75620+75620+75620+75620+75620+7562+7562+7562

Efficiency Example public class BetterBigOEx { public static void main (String [] args) { long firstOrig,secondOrig; long firstOp = firstOrig = 7562; long secondOp = secondOrig = 1000000000 ; int secOpLength=10; long product = 0; for (int digitPosition=0; digitPosition<secOpLength;digitPosition++) { int digit = (int)(secondOp - (secondOp/10)*10); for (int counter = digit; counter > 0; counter--) product = product+firstOp; secondOp = secondOp/10; // discard last digit firstOp = 10*firstOp; //tack a 0 to the right } System.out.println("Product of "+firstOrig+" and "+secondOrig+" = "+product); } }

Efficiency Example • Does efficiency matter? • How do we measure it? • Create 2 programs and measure the difference (not always possible) • Measure algorithm before implementation

Efficiency Calculation • Let’s say you need to get downtown. You can walk, drive, ask a friend to take you, or take a bus. What’s the best way?

Efficiency Measurement • Algorithms have measurable time and space requirements called complexity. • We are not considering how difficult it is to code, but rather the time it takes to execute and the memory it will need.

Analysis of Algorithms • Usually measure time complexity • Compute approximation, not actual time • Typically estimate the WORST (maximum) time the algorithm could take. Why? • Measurements also exist for best and average cases, but generally you look for the worse case analysis.

Analysis of Algorithms How do we compute worst case time? PROBLEM: Compute the sum 1+2+…+n for some positive integer “n”. Think about possible ways to solve this problem… (then look at the next 3 slides for suggestions!)

Analysis of Algorithms AlgorithmA: computes 0+1+2…+n from left to right: sum = 0; for (i=1 to n) sum += i

Analysis of Algorithms AlgorithmB: computes 0 + 1 + (1+1) + (1+1+1)+…(1+1+…+1): sum = 0; for (i=1 to n) for (j = 1 to i) sum++;

Analysis of Algorithms AlgorithmC: uses an algebraic identity to compute the sum sum = n * (n+1)/2

Analysis of Algorithms How do we determine which algorithm (A, B, or C) is fastest? • Consider the size of the problem and the effort involved. (Measure problem size using “n”) • Find an appropriate growth-rate function. • Count the number of operations required by the algorithm

Analysis of Algorithms • AlgorithmA: n+1 assignments, n additions, 0 multiplications, and 0 divisions for TOTAL OP’S: 2n+1 • AlgorithmB: 1+n(n+1)/2 assignments, n(n+1)/2 additions, 0 multiplications, and 0 divisions for TOTAL OP’S: n2+n+1 • AlgorithmC: 1 assignments, 1 additions, 1 multiplications, and 1 divisions for TOTAL OP’S: 4

Analysis of Algorithms • So, • AlgorithmA’s growth rate function is 2n+1 time units • AlgorithmB’s is n2+n+1 time units • AlgorithmC’s is constant. • Speed wise, C’s fastest, followed by A and then by B.

Analysis of Algorithms • The running time of an algorithm is a function of the size of the input. More data means that the program takes more time.

Analysis of Algorithms • How do we express this, then, in proper notation? • First rule: focus on the large instances of the problem; that is, consider only the dominant term in each growth rate function. (Here, n2) • The difference between n2+n+1 and n2 is relatively small for large n and so we can use the term with the largest exponent to describe the growth rate.

Big Oh Notation • Computer Scientists use different notations to represent best, average, and worst case analysis. Big-Oh represents worst case. So: • AlgorithmA is O(n) • AlgorithmB is O(n2) • AlgorithmC is O(1) (aka “constant time”)

Real Life Examples • You are seated at a wedding reception with a table of n people. In preparation for a toast, the waiter pours champagne into n glasses. What is the time complexity? • Someone makes a toast. What is the time complexity? • Everyone clinks glasses with everyone else?

Designing Efficient Algorithms • Generally, we are wanting to process a large amount of data • We want to design an algorithm (step by step instructions) that will use the resources (memory, space, speed) of the computer well.

As we previously mentioned, the amount of time taken is our usual tool to analyze and this is determined by the amount of input – so the running time of the algorithm is given as a function of its input size.

Questions to Ask • Is it always important to be on the most efficient curve? • How much better is one curve than another? • How to you decide which curve a particular algorithm lies on? • How do you design algorithms that avoid being on less efficient curves?

Functions in Order of Increasing Growth Rate Function Name C Constant logN Logarithmic log2N Log-squared N Linear NlogN NlogN N2 Quadratic N3 Cubic 2N Exponential

Growth rate of a function is most important when N is sufficiently large. • When input sizes are small, it is best to use the simplest algorithm • Quadratic algorithms are impractical if input size > a few thousand • Cubic algorithms are impractical if input size > a few hundred.

3 Problems to Analyze • Minimum Element in an Array • Closest Points in the Plane • Colinear Points in the Plane

Minimum Element in an Array • Given an array of N items, find the smallest item. • Obvious Solution: • Maintain a variable “min” that stores the minimum element • Initialize “min” to the first element • Make a sequential scan through the array and update “min” as appropriate • Running Time?

Closest Points in the Plane • Given N points in a plane, find the pair of points that are closest together • Obvious Solution: • Calculate the distance between each pair of points • Retain the minimum distance • Running Time?

Colinear Points in the Plane • Given N points in a plane, determine if any three form a straight line. • Obvious Solution: • Enumerate all groups of 3 points • Running Time?

Maximum Contiguous Subsequence Sum Problem Given (possibly negative) integers A1, A2…An, find (and identify the sequence corresponding to) the maximum value of the sum from elements i through j of the list. The maximum contiguous subsequence sum is zero if all integers are negative.

Maximum Contiguous Subsequence Sum Problem Example: Given {-2, 11, -4, 13, -5, 2}, the answer is 20, the contiguous subsequence from items 2 through 4.

Maximum Contiguous Subsequence Sum Problem Designing a Solution: • Consider emptiness • Obvious Solution (aka “Brute Force”) • Can we improve it? (must be a little clever) • Can we further improve it? (must be really clever and/or experienced!)

Maximum Contiguous Subsequence Sum Problem Obvious Solution O(N3) • A direct and exhaustive search (Brute Force Approach) • Pro: Extreme simplicity – easy to program • Con: Least efficient method

/** * Cubic maximum contiguous subsequence sum algorithm. * seqStart and seqEnd represent the actual best sequence. */ public static int maxSubSum1( int [ ] a ) { int maxSum = 0; for( int i = 0; i < a.length; i++ ) for( int j = i; j < a.length; j++ ) { int thisSum = 0; for( int k = i; k <= j; k++ ) thisSum += a[ k ]; if( thisSum > maxSum ) { maxSum = thisSum; seqStart = i; seqEnd = j; } } return maxSum; }

Maximum Contiguous Subsequence Sum Problem To calculate the analysis of the algorithm, you basically count the number of times each statement is executed and then pick the dominant one. In our case, the statement inside the 3rd for loop is executed a little less than N3 times, making it the dominant term.

Maximum Contiguous Subsequence Sum Problem SHORTCUT: • We see a loop of potentially size N inside a loop of potentially size N inside a loop of potentially size N – N*N*N potential iterations! • Generally, this cost calculation is off by a constant factor (that gets removed by Big-Oh notation anyway), so we can get away with it.

Maximum Contiguous Subsequence Sum Problem Since our cubic algorithm seems to be the result of statements inside of loops, can we lower the running time by removing a loop? Are they all necessary? In some cases, we can’t remove the loop. In this one…

Maximum Contiguous Subsequence Sum Problem Let’s observe that we calculate the contiguous subsequence as we go – we don’t need to reinvent the wheel each time (as our algorithm does) – we only need to add one additional number to what we just calculated. So, programming that perspective on the algorithm removes one of the loops and gives us a O(N2) algorithm.

/** * Quadratic maximum contiguous subsequence sum algorithm. * seqStart and seqEnd represent the actual best sequence. */ public static int maxSubSum2( int [ ] a ) { int maxSum = 0; for( int i = 0; i < a.length; i++ ) { int thisSum = 0; for( int j = i; j < a.length; j++ ) { thisSum += a[ j ]; if( thisSum > maxSum ) { maxSum = thisSum; seqStart = i; seqEnd = j; } } } return maxSum; }

Maximum Contiguous Subsequence Sum Problem Can we do better? Can we remove yet another loop? We need a clever observation that allows us to eliminate some subsequences from consideration without calculating their sums. Can we do that?

Maximum Contiguous Subsequence Sum Problem Intuitively, if a subsequence’s sum is negative, it can’t be part of the maximum contiguous subsequence. All contiguous subsequences that border the maximum contiguous subsequence must have negative or 0 sums, or they would be included. When a negative subsequence is found, we can not only break the inner loop, we can advance i to j+1 (Proof, Th 5.3 p. 175)

Maximum Contiguous Subsequence Sum Problem • With these observations, we can find the solution to this problem in linear time - O(N).

/** * Linear-time maximum contiguous subsequence sum algorithm. * seqStart and seqEnd represent the actual best sequence. */ public static int maxSubSum3( int [ ] a ) { int maxSum = 0; int thisSum = 0; for( int i = 0, j = 0; j < a.length; j++ ) { thisSum += a[ j ]; if( thisSum > maxSum ) { maxSum = thisSum; seqStart = i; seqEnd = j; } else if( thisSum < 0 ) { i = j + 1; thisSum = 0; } } return maxSum; }

CSC 332 – Algorithms and Data Structures