Approximation Algorithms

By: Ryan Kupfer, Luis Colón, Joe Parisi CMSC 435 Algorithm Analysis and Design Approximation Algorithms

What is an Approximation Algorithm • Approximation Algorithms: • Run in polynomial time • Find solutions that are guaranteed to be close to optimal

Different Problems where approximate Algorithms are used. • 11.1 Greedy Algorithm & Bounds on the Optimum: A load Balancing problem. • 11.2 The Center Selection Problem • 11.3 Set Cover: A General Greedy Heuristic • 11.4 The Pricing Method: Vertex Cover • 11.5 Maximization via the Pricing Method: the Disjoint Problem • Traveling Salesman Problem

Greedy Algorithm Load Balancing Problem -The Problem • Problem: Balance the load on each of the servers in order to split up the work load across them • Declare the load on a machine Mi, minimize a quantity known as the makespan (maximum load on any machine) • Take a greedy approach to this problem, the algorithm makes one pass through the jobs in any order and puts that job on the machine with the smallest load

Load Balancing Analyzing Algorithms • Our makespanfor our algorithm is not much bigger than the makespanof the optimal solution, however, we cannot compare that because we cannot compute that optimal value (due to time constraints) • Therefore, we need to set a lower bound on the optimum quantity with the guarantee that no matter how good the optimum is, it cannot be less than this bound in order to make a better approximation algorithm

How can we improve this algorithm? • Guarantee being within a factor strictly smaller than 2 away from the optimum

11.2 The Center Selection Problem • Similar to load balancing but also where to place server in order to keep the formulation clean and simple • The Center Selection Problem provides an example of when a natural greedy algorithm does not give the optimal solution, instead a slightly different greedy version can guarantee us a near optimal solution

Designing and Analyzing the Algorithm • The regular greedy Algorithm would not work with a problem between two sites s and z and there distance is 2 the regular greedy algorithm would place them half way between each other while the actual optimum solution would just place them at the same location and the radius around it would be zero • Knowing the Optimal Radius helps.

11.3 Set Cover: A General Greedy Heuristic • Set Cover is something that can be used for special case algorithms. Approximate Algorithms is something that the set Cover Algorithm can be applied to. • Set Cover is a collection of subsets of U whose union is equal to all U.

Designing the Algorithm • The Greddy Algorithm for this problem will have the property that it builds the cover set one at a time then it picks the next set depending on what will reach the goal more naturally. • A good set has the properties of small weight and it covers overs lots of elements. However, this is not enough to make a good approximation algorithm we need to combine the two properties and find the cost per element. Which will be a good guide.

Analyzing the Set Cover Algorithm • Our algorithm is good to find a solution but we have to wonder how much larger is the weight of this set cover than the weight of the optimal set. • The set cover C selected by the Greedy Set Cover has weight at most H(d*) times the optimal weight. • From this we can find a desired bound for trying to find the optimal weight and a good approximation algorithm.

11.4 The Pricing Method: Vertex Cover • We want to find a vertex cover S for which w(S) is minimum. When all weights are equal to 1, deciding of there is a vertex cover of weight at most K is the standard decision version of the vertex cover. • A vertex cover for an undirected graph G = (V,E) is a subset S of its vertices's such that each edge has at le ast one endpoint in S. In other words, for each edge ab in E, one of a or b must be an element of S. • Vertex Cover <= Set cover if all weights are equal to 1.

Designing Pricing Method Algorithm • For the case of the Vertex Cover Problem , we think of the weights on the nodes as Costs, and we will think of each edge as having to pay for its share of the cost of the vertex cover we find • The goal of this approximation algorithm is to find a vertex cover, set prices at the same time, and use these prices to select the nodes for the vertex cover

11.5 Maximization via the Pricing Method: The Disjoint Paths Problem • This usually problem usually arises in network routing, the special case that we are dealing with is where each path to be routed has its own designated starting node, S, and ending node, T • Treat (S, T) as a routing request which asks for a path from S to T

Solving the Disjoint Path Problem with a Pricing Algorithm • For this algorithm: • Have the paths pay for the edges • Edges can be shared among paths, however the more that edge is used, the more costly it becomes • Distinguish the difference between the short and long paths

11.6 Linear Programming and Rounding an Application to Vertex Cover • Linear programming is a technique that can be very powerful in applying to different sets of problems • We can apply it to the Vertex Cover Problem • Linear programming can be seen as following a more complex version of regular algebraic expressions just with inequalities instead of equations

Traveling Salesman Problem • We know that the traveling salesman problem is more of an optimization problem but it applies to approximate algorithms because this problem is of type NP-Hard • The Problem: Given a number of cities and the costs of traveling from any city to any other city, what is the least-cost round-trip route that visits each city at least once and then returns to the starting city.

Solving the salesman Problem • Since the salesman problem is a problem of NP Hardness, we are able to solve it in the same way we can solve an NP-Hard problem… using an Approximation Algorithm. • It will give us a solution that can be 2% - 3% away from the optimal solution which could be faster and more cost effective than an exact solution algorithm.

Problem 1 Overview • A ship arrives with n containers of • weight (w1, w2, ..., wn) • There are a set of trucks which • can hold K units of weight • Minimize the number of trucks needed to carry all the containers • The Greedy Algorithm: • Start with an empty truck and pile the containers in the order that they came in and move on to the next truck if the next container does not fit • Repeat until there are no more containers

Problem 1 Continued • a) Given an example of set of weights and there value we figure this out. • If K is 10 and the set of containers is: {6, 5, 4, 3}: • The first truck would be loaded with container 1, weight 6 and since the next container is of weight 5, it would be overflowing so send truck 1 off with container 1. The next truck would contain containers 2 and 3 (weights 5 and 4), and the last truck would only have container 4, weight 3. • The optimal solution to this would be to load containers 1 and 3 (weights 6 and 4) into truck 1 and containers 2 and 4 (weights 5 and 3) into truck 2.

Problem 1 Continued • b) Show that the number of trucks used by this algorithm is within a factor of 2 of the minimum possible number of trucks for any set of weights and any value of K. • Suppose that each truck could hold a maximum weight of 10 (K = 10) • Given the set of containers: • S = {6, 5, 6, 5, 6, 5, 6, 5} (Worst-case scenario for K) • This would require 8 trucks • A better algorithm which could look ahead would require only 6 trucks

Problem 2 Overview • The idea of this problem is to build a representative set for a large collection of protein molecules whose properties are not completely understood. This would allow the researchers to study the smaller representative set and by inference learn about the whole set

Problem 2 Continued • a) Given a large set of proteins P, using the similarity-distance function dist(p, q) which returns a delta value of similarity (<= delta being similar), give an algorithm to determine the smallest representative set R. • Initialize a representative set R with the empty set • Start with a set P of proteins • While P still has proteins • Select a protein p from P at random • Choose a protein q from P which maximizes the function dist(p, q) <= delta • Add q to R • Remove p, q, and all proteins where dist(p, q) <= delta

Problem 2 Continued • b) The algorithm listed for the Center Selection problem does not solve our protein problem because increasing the delta to 2 * delta would not be relevant.

Problem 3 Overview • We would like to find a subset S of A which is the maximum feasible value (the sum of subset S does not exceed a certain given value)

Problem 3 Solution • a) Given the following algorithm: • S = { } • T = 0 • For I = 1, 2, …, n • If T + ai <= B • S S U {ai} • T T + ai • End If • End For • Give an instance in which the total sum of the set S returned by this algorithm is less than half the total sum of some other feasible subset of A • A = {1, 3, 10}, B = 11, the algorithm above would only return 1 and 3 (total of 4) where an optimal one would return 1 and 10 (total of 11)

Problem 3 Solution Continued • The way that our algorithm works is: • First it sorts the contents of the set in ascending order using quicksort • It then alternates between the largest and smallest elements adding them to a total value to compare to the given B • If the new would-be total value is less than or equal to B then add the chosen element to the feasible set and add that chosen element to the total value.

Problem 3 Solution Continued • Now we have a feasible set that contains elements of the full set A and the summation of the values of the feasible set (Total). With the value of Total, we can see that we have the best feasible set for any given run of this algorithm. • To be certain that this algorithm ran as I claim, I ran five million examples of it on randomly generated data. For each test run, the set A had a length of range 3-50 and each element was in a range of 1-50. The random value for B is in the range of the smallest – sum.

Problem 3 Solution Continued • During the 5,000,000 test runs, there were no cases which returned a feasible subset of A in which the summation was less than half of the total possible summation of all elements of A • Because it is not possible for an algorithm to have its half point higher than the total half point of A, our algorithm will always return a result more than one half of any other subset returned by any other algorithm

Problem 5 Overview • A company has a business where clients bring in jobs each day for processing and each job has a processing time t on ten machines. • The company is running the Greedy Balance algorithm and it may not be the best approximation algorithm that can be used. • We must prove that the greedy- Balance algorithm will always give a makespan o fat most 20 percent above the average load.

Problem 5 Solution • Show that the company’s greedy algorithm will always find a solution whose makespan is at most 20% above the average load • Assuming the total load is at its lowest value, 3000 and the optimal load is 1/10 of that total load (since there are 10 machines), 300, therefore the average load per machine cannot be higher than 360 • Assume you have 10 jobs at 25 (average job size) • 300 / 250 = 120%

Problem 10 Overview • We are given an n x n grid graph G. • Associated with each node v is a weight w(v) which is a non-negative integer. All the weights of all nodes are distinct. • The goal is to choose an independent set S of nodes of the grid so that the sum of the weights of node S is as large as possible. • Start with S = { } • While G still has nodes • Choose the highest weighted node, v • Add v to S • Delete v and its neighbors from G • Return S

Problem 10 Solution • a) Let S be the independent set returned, let T be any other independent set in G. Show that for each node v in T, either v is in S, or there is a node v’ in S so that the weight of v is less than or equal to the weight of v’ • This is true simply due to the fact that it is asking for independent sets, if a node in the “random” independent set is not in the max weighted independent set, it must have a neighbor in that set with a higher or equal weight

Problem 10 Solution • b) Show that the greedy algorithm given returns an S at least 1/4th the maximum total weight of any independent set • In order to prove this, you’d have to assume for any node chosen in any independent set that each of its four neighbors could be a higher value • Assign the node in the independent set being tested four times the weight. • Summing the node weights results in a value 4 times greater than that of the greedy algorithm

Approximation Algorithms