Algorithms

CS4HS at Marquette University Algorithms

Which is a problem? • Find the sum of 4 and 7 • Sort a list of words in alphabetical order • State the most beautiful phrase in the English language • Name the smallest even number greater than 2 that cannot be written as the sum of two primes

Computational problems • A problem is a relationship between input data and output data • For every set of input values, there is only one correct set of output • A specific set of input and output is called an instance • Adding 4 and 7 is an instance of the general problem of adding integers

What is an algorithm? • An algorithm is a solution to a problem • It gives a finite series of steps that you can follow from any legal input data to reach the correct output data • Can you give an example?

How good is an algorithm? • How do you measure how long an algorithm takes? • We can describe the number of steps the algorithm takes as a function of the size of input data (usually abbreviated as n) • How does the number of steps needed to solve an instance grow as the input grows?

Big-Oh notation • Let's say an algorithm takes 34n3 – 5n2 + 102n + 1471 steps for an input of size n • Ugh. • Since we care about growth when n gets big, the biggest term is what matters • Since different computers are better at different operations, we ignore constant coefficients • We say 34n3 – 5n2 + 102n + 1471 is O(n3) • Cubic growth is what matters

Addition by hand • How long does it take to do addition by hand? 123 + 456 579 • Let’s assume the numbers have n digits • n additions + n carries • Running time: O(n)

Multiplication by hand • How long does it take to do multiplication by hand? 123 x 456 738 615 _492__ 56088 • Let’s assume the numbers have n digits • (n multiplications + n carries) x n digits + (n + 1 digits) x n additions • Running time: O(n2)

Searching and Sorting

Find my number • I've got a list of n numbers • You want to look at my list to see if your favorite number (42) is on it • How many numbers do you have to look at to see if my list has your number? • How long does that take in Big-Oh notation?

Can we do better? • Is there any way to go smaller than O(n)? • What if the numbers were processed in some way beforehand? • What if you knew that the numbers were sorted in ascending order?

Binary search • Repeatedly divide the search space in half • We play a high-low game • Again, let's say we're looking for 42 54 42 23 31 Check the middle Check the middle Check the middle Check the middle (Too low) (Too low) (Found it!) (Too high)

How long does it take? • We cut the search space in half every time • At worst, we keep cutting n in half until we get 1 • In that case, we know we didn't find our number • Let’s say x is the number of times we look: • The running time is O(log2n)

Interview question • This is a classic interview question asked by Microsoft, Amazon, and similar companies • Imagine that you have 9 red balls • One of them is just slightly heavier than the others, but so slightly that you can’t feel it • You have a very accurate two pan balance you can use to compare balls • Find the heaviest ball in the smallest number of weighings

What’s the smallest possible number? • It’s got to be 8 or fewer • We could easily test one ball against every other ball • There must be some cleverer way to divide them up • Something that is related somehow to binary search

That’s it! • We can divide the balls in half each time • If those all balance, it must be the one we left out to begin with

Nope, we can do better • How? • They key is that you can actually cut the number of balls into three parts each time • We weigh 3 against 3, if they balance, then we know the 3 left out have the heavy ball • When it’s down to 3, weigh 1 against 1, again knowing that it’s the one left out that’s heavy if they balance

Thinking outside the box, er, ball • The cool thing is… • Yes, this is “cool” in the CS sense, not in the real sense • Anyway, the cool thing is that we are trisecting the search space each time • This means that it takes log3nweighings to find the heaviest ball • We could do 27 balls in 3 weighings, 81 balls in 4 weighings, etc.

Bubble sort • CS people have written hundreds of papers just about how to sort things better • Bubble sort is a very simple sorting algorithm • It is not the fastest algorithm • Make a number of passes • In each pass, swap each contiguous pair of items if they're out of order • Keep making passes until no swaps are necessary

Single pass example • Run through the whole list, swapping any entries that are out of order No swap Swap 0 7 45 0 54 37 108 51 No swap 45 Swap 37 51 No swap 54 108 Swap

Activity • Everyone stand up • We're going to use the bubble sort algorithm to sort us all by birth month and day (you can keep the year to yourself) • In each pass, we sequentially swap pairs that are out of order • We keep making passes until no one needs swapping • We are guaranteed that the last person is put in the right place after each pass

How many passes do we need? • How bad could it be? • What if the array was in reverse-sorted order? • One pass would only move the largest number to the bottom • We would need n – 1 passes to sort the whole array • What's the total time for bubble sort? 6 6 6 6 6 7 6 5 5 6 5 7 5 5 4 4 4 4 7 5 5 4 4 3 4 3 3 7 3 7 3 2 3 2 3 2 1 2 2 2 7 2 1 1 1 1 1 1 7

Paths

Google Maps • How does Google Maps find the shortest route from Philadelphia to Milwaukee? • It simplifies all the streets into a graph • A graph has nodes and edges • Nodes represent locations • Edges are (parts of) streets

Shortest paths • We can write a number next to each edge which gives its weight • Weight can represent time, distance, cost: anything, really • The shortest path (lowest total weight) is not always obvious 6 B 5 D 2 8 3 4 C A E 16

What’s the shortest path? • Take a moment and try to find the shortest path from A to E. • The shortest path has length 14 6 B B 5 D D 2 8 3 4 C C A A E E 16

How can we always find the shortest path? • On a graph of that size, it isn’t hard to find the shortest path • A Google Maps graph has millions and millions of nodes • How can we come up with an algorithm that will always find the shortest path from one node to another? • Dijkstra's algorithm adds the closest node we can find, expanding our set of stuff we know the best way to get to

Dijkstra’s algorithm notation

Dijkstra’s Algorithm • Start with two sets, S and V: • S has the starting node in it • V has everything else • Set the distance to all nodes in V to ∞ • Find the node u in V that is closest to a node in S • For every neighbor v of u in V • If d(v) > d(u) + d(u,v) • Set d(v) = d(u) + d(u,v) • Move u from V to S • If V is not empty, go back to Step 2

Example for Dijkstra Finding the shortest distance from A to all other nodes 6 B 5 D 2 8 3 4 C A E 16

Traveling salesman • The traveling salesman problem (TSP) also wants to find the shortest path, but it adds two requirements • You have to return to the point where you start • You have to visit all the cities exactly once • Like a UPS guy who wants to make all his dropoffs and return to the shipping center 6 B B D 5 D 2 8 3 4 C C A A E E 16

What algorithm should we use? • TSP seems easy! • There must be some way we can adapt Dijkstra's algorithm to work • Suggestions? • The greedy approach always adds the nearest neighbor • A brute force approach tries all possible tours and sees which is shortest

Greedy doesn’t work • We are tempted to always take the closest neighbor, but there are pitfalls Greedy Optimal

Brute force is brutal • In a completely connected graph, we can try any sequence of nodes • If there are n nodes, there are (n – 1)! tours • For 30 locations, 29! = 8841761993739701954543616000000 • If we can check 1,000,000,000 tours per second, it will only take about 20,000 times the age of the universe to check them all • We will (eventually) get the best answer!

A mathematical interlude • Many of my students have a poor intuitive grasp of mathematics • The following series often comes up: • Anyone know the closed form of this sum? • Students often see this and say, "Isn't that justn factorial?" • does look similar • However, O(n2) is reasonable while O(n!) is a horrifying demon that algorithm designers have nightmares about

NP-complete problems • TSP is just one NP-complete problem • These problems are pretty useful (scheduling, optimizing resource usage, packing boxes) • No running-times for NP-complete problems are known that are better than exponential O(2n) • All these problems can easily be converted into each other • If you could solve one efficiently, you could solve them all efficiently • The Clay Mathematics Institute has a $1 million prize for such an algorithm • You get the same prize if you could prove that no such algorithm is possible

Computability

Turing machine • A Turing machine is a model of computation • It consists of a head, an infinitely long tape, a set of possible states, and an alphabet of characters that can be written on the tape • A finite list of rules says what it should write and whether it should move left or right given the current symbol and state A

Computational power • Computational power means the ability to solve a problem • The ability to run an algorithm • Speed is not related to computational power • Turing machines have as much computational power as any known computer • Any algorithm that can be run on any computer can be run on a Turing machine

Halting problem • There are some computational problems that you cannot design an algorithm for • No computer can ever solve the general problem (though sometimes you can solve instances of the problem) • Examples • It's impossible to write a program that can tell if any program will run to completion

Turntables  • Douglas Hofstadter uses the metaphor of turntables • Imagine that evil people design records that will shake turntables apart when they're played • Maybe turntable A can play record A and turntable B can play record B • However, if turntable A plays record B, it will shatter  

Stuff you have to buy for this proof • Turing machines can perform all possible computations • It's possible to encode the way a Turing machine works such that another Turing machine can read it • It's easy to make a slight change to a Turing machine so that it gives back the opposite answer (or goes into an infinite loop)

Proof by contradiction m x H • You've got a Turing machine M with encoding m • You want to see if M will halt on input x • Assume there is a machine H that can take encoding m and input x • H(m,x) is YES if it halts • H(m,x) is NO if it loops forever • We create (evil) machine E that takes description m and runs H(m,m) • If H(m,m) is YES, E loops forever • If H(m,m) is NO, E returns YES YES NO E H m Loop forever YES NO YES

A mind-bending proof • Let's say that e is the description of E • What happens if you feed description e into E? • E(e) says what E will do with itself as input • If it returns YES, that means that E on input e loops forever • But it can't, because it just returned YES • If it loops forever, then E on input e would return YES • But it can't, because it's looping forever! • Our assumption that machine H exists must be wrong

Algorithms

Algorithms

Presentation Transcript

Algorithms

Algorithms

ALGORITHMS

Genetic Algorithms, Search Algorithms

Algorithms

Algorithms

Algorithms

Algorithms

Algorithms

ALGORITHMS

Algorithms

Algorithms

Algorithms

Algorithms

Algorithms

Algorithms

ALGORITHMS

Algorithms

Algorithms

Algorithms

§ 6.5 - 6.8 Algorithms, Algorithms, Algorithms

Algorithms