The Traveling Salesman Problem in Theory & Practice

The Traveling Salesman Problem in Theory & Practice Lecture 9: Optimization 101, or “How I spent my Spring Break” 25 March 2014 David S. Johnson dstiflerj@gmail.com http://davidsjohnson.net Seeley Mudd 523, Tuesdays and Fridays

Outline • Elementary Exhaustive Search • The Value of Pruning • More General Branch-and-Bound • Student Presentation JinyuXie, on “The Complexity of Facets”

Credits • J. L. Bentley, “Faster and faster and faster yet,” UNIX Review 15 (1997), 59-67. • D. L. Applegate, W. J. Cook, S. Dash, A Practical Guide to Discrete Optimization (book in preparation).

Program for Evaluating (N-1)! Tours • standard routines for • reading instances, • printing output, • computing tour lengths, • swapping elements. • We enumerate tours as follows: for (i = 0; i< N; i++) tour[i] = i;permute(N-1, 2*MAXDIST); void permute(int k) { inti, len; if (k == 1) { len= tour_length(); if (len < bestlen) { bestlen= len; for (i = 0; i < N; i++) besttour[i] = tour[i];} } else { for (i = 0; i < k; i++) { tour_swap(i, k-1) permute(k-1); tour_swap(i, k-1); } } } Lines of Code = 86

Results for Basic Program Methodology: • Construct lower triangular distance matrix M for a 100-city random Euclidean instance. • Instance INconsists of the first N rows of M. Advantages: • Somewhat better correlation between results for different N so running time growth rates are less noisy. • Fewer instances to keep track of. Disadvantages: • Results may be too dependent on M, so should try different M’s

Results for Basic Program (Running times on 3.6 Ghz Core i7 processors in an iMac with 32 Gb RAM) Suggestions for Speedups? Compiler Optimization: cc –O2 exhaust.c Lines of Code, still = 86

Next Speedup Don’t need to consider both tour orientations. • Only consider the orientation in which city 1 precedes city 0. • Implement this by adding an input flag to permute(), which equals 0 unless city 0 has already been fixed. • (Recall that we fix the cities from right to left, starting with tour[N-1] = N-1). • Lines of Code = 97, an increase of 11.

Next Speedup More valuable idea: Prune! More efficient distance calculations void permute(int k) { inti, len; if (k == 1) { len = tour_length(); if (len < bestlen) { bestlen = len; for (i = 0; i < N; i++) besttour[i] = tour[i];} } else { for (i = 0; i < k; i++) { tour_swap(i, k-1) permute(k-1); tour_swap(i, k-1); } } } void permute(int k, inttourlen) { inti, len; if (k == 1) { tourlen+= dist(tour[0],tour[1]) + dist(tour[ncount-1],tour[0]); if (tourlen < bestlen) { bestlen = tourlen; for (i = 0; i < N; i++) besttour[i] = tour[i];} } else { for (i = 0; i < k; i++) { tour_swap(i, k-1) permute(k-1, tourlen+dist(tour[k-1],tour[k])); tour_swap(i, k-1); } } } if (tourlen >= bestlen) return; Lines of code increases from 97 to 99 O(N) factor of improvement – probably worth one additional city.

Results

Next Speedup void permute(int k, inttourlen) { inti, len; if (k == 1) { tourlen+= dist(tour[0],tour[1]) + dist(tour[ncount-1],tour[0]); if (tourlen < bestlen) { bestlen = tourlen; for (i = 0; i < N; i++) besttour[i] = tour[i];} } else { for (i = 0; i < k; i++) { tour_swap(i, k-1) permute(k-1, tourlen+dist(tour[k-1],tour[k])); tour_swap(i, k-1); } } } tourlenonly includes the costs of edges linking the city in slot k through the city in slotN-1. We would get more effective pruning if we could take into account the edges linking the cities that haven’t yet been fixed. More precisely we could use a lower bound on the minimum length path involving these cities and linking the city in slot k to city N-1. An obvious first choice is the length of a minimum spanning tree connecting all these cities. if (tourlen >= bestlen) return; Lines of code increases from 99 to 134 if (tourlen >= bestlen + mst(k+1)) return;

Results

Next Speedup void permute(int k, inttourlen) { inti, len; if (k == 1) { tourlen+= dist(tour[0],tour[1]) + dist(tour[ncount-1],tour[0]); if (tourlen < bestlen) { bestlen = tourlen; for (i = 0; i < N; i++) besttour[i] = tour[i];} } else { for (i = 0; i < k; i++) { tour_swap(i, k-1) permute(k-1, tourlen+dist(tour[k-1],tour[k])); tour_swap(i, k-1); } } } bestlen isn’t so hot either, at least initially. If we start by simply setting bestlen to the Nearest Neighbor tour length, it would seem likely to have a benefit, and it doesn’t take much coding effort -- Lines of code goes from 134 to 155. if (tourlen >= bestlen + mst(k+1)) return;

Results

Next Speedup • As we know from our earlier discussions of tour-construction heuristics, Nearest Neighbor is a pretty lousy bound, typically 24% above optimal for random Euclidean instances. • Question: How can we get a better bound cheaply? • Answer: Guess low – say at (3/4)NN. • The algorithm will presumably run faster if our initial value of bestlen is less than the optimal tour length. • If no solution is found, we can rerun the algorithm with an incrementally increased initial value for bestlen, say 5% above the previous value. • When our initial value eventually exceeds optimal, it will not exceed it by more than 5% (much better than 24%). • Lines of code increases from 155 to 165.

Results

Next Speedup • A large amount of our time goes into computing MST’s, and much of it may be repeated for different permutations of the same set of cities. • Could we save time by hashing the results so as to avoid this duplication? • How would that work?

Set Hashing • Start by creating random unsigned intshashkey[i], 0 ≤ i < N. • Create an initial empty hashtableMstTable, say of size hashsize= 524288 = 219. (We will double the size of MstTable whenever half of its entries become full.) • The hash value hval(C) for a set C of cities is obtained by computing the exclusive-or of the |C| values {hashkey[i]: i ∈ C}. (The ith bit of the hash value is 1 if and only if an odd number of the ith bits of the keys equal 1.) • Because MstTable can grow, hval(C) will actually refer to the entry with index hval(C)%hashsize (the remainder when hval(c) is divided by hashsize). • An entry in the table will consist of two items: • The length Bnd of the minimum spanning tree for the cities in C. • A bitmap B of length N, with B[i] = 1 if and only if i∈ C. • We will handle the possibility of collisions by using “linear probing”.

Linear Probing • To add the MST value for a new set C to the hashtable, we first try MstTable[i] for i = hval(C)%hashsize. If that location already has an entry, set i = i+1(mod hashsize) and perform the following loop to find a suitable location: • While {MstTbl[i] is full, set i = i+1(mod hashsize)} • In the worst case, this could take time proportional to hashsize, but there are simple tabulation-based schemes that can be shown to take amortized constant time per search [Patrascu & Thorup]. • To check to see if we have already computed the MST for C, we follow a similar scheme, where we start by setting i = hval(C)%hashsize. • While MstTbl[i] is not empty • If MstTbl[i].bitmap equals the bitmap for C, return MstTable[i].Bnd. • Set i = i+1(mod hashsize). • Return “Not Cached”. (Lines of code increases from 165 to 274)

Results

Next Speedup Can we improve on the MST lower bound? Yes, but this will require some new ideas. At least, new to this class. They actually come from • [Held & Karp, ‘‘The traveling-salesman problem and minimum spanning trees,’’ Operations Res. 18 (1970), 1138-1162] • [Held & Karp, ‘‘The traveling-salesman problem and minimum spanning trees: Part II,’’ Math. Programming 1 (1971), 6-25] • [Held, Wolfe, & Crowder, ‘‘Validation of subgradientoptimization,’’ Math. Programming 6 (1974), 62-88

p-values We want a lower bound on the shortest path between the city in slot kand cityN-1, involving those two cities and all the as-yet-unfixed cities, that is, a fixed-endpoints Hamilton path in the complete graph induced by these k+2 cities. Suppose we modify the distance matrix for this graph, by assigning a potential function value p(i) to each unfixed cityi, and set pdist(i,j) = dist(i,j) + p(i) + p(j), where p(i) = 0 for all fixed cities i. Then the length of the Hamilton path increases by 2∑ip(i), but the length MSTp of the minimum spanning tree might increase by more. The new lower bound would be MSTp-2∑ip(i). Key goal: find p-values that make this bound as big as possible.

Setting the p-values (I) First, another observation. We can get a possibly better lower bound by computing a value MSTp’ that might be larger than MSTp, but such that MSTp’ -2∑ip(i) will still be a lower bound on the Hamilton path length: • Let e1 and e2be the two fixed endpoints for the Hamilton path (the city in slot k and city N-1). • Compute an MST for the unfixed vertices. Then add an edge from each eito the unfixed city whose p-distance to ei is minimum. • This guarantees that the two ei both get degree 1 in the resulting tree, something not required for an overall MST (but necessary for our Hamilton path). Note that in the optimal Hamilton path, the edges involving the ei have to be at least as long as the ones we chose above, and the p-length of the path of unfixed cities linking the path neighbors of e1 and e2 (involving all the unfixed cities) has to be at least as long as the MST for the unfixed cities.

e1 e2 d = 4 Length of overall MST = 13 Length of restricted MST = 7 + 6√2 = 15.48.. Length of min Hamilton path = 7 + 5√2 + √5 = 16.30..

Setting the p-values (II) An iterative process, starting with p(i) = 0 for all i, α = 1, and bestMST =0. While not yet done, do the following • Compute restricted MST for current p-values. • If MSTp’ > bestMST, do the following • For each unfixed city i, let deg(i) be the number of edges adjacent to city i in the current restricted MST. • For each unfixed city i, set p(i) = p(i) + α(deg(i) – 2). • Update α. • If α≤ 1, quit. • Revert to previous p-values and update α. Key Observations: The value of p(i) does not change if city ialready has the desired degree of 2. It increases if deg(i) > 2, thus discouraging edges that connect to city i. It decreases if deg(i) = 1, thus encouraging edges to connect to city i.

A Side Benefit Theorem: Under this scheme, we will always have ∑ip(i) = 0. (Note that this means that MSTp’is itself our desired lower bound, with no need for adjustment.) Proof: We start with all values equal to zero, so the claim initially holds. If there are k unfixed cities, then we have k+2cities total, and the spanning tree has k+1 edges, for a total degree of 2k+2. This is 2k if we ignore the degrees of e1 and e2. For each unfixed city i, the change in its p-value is α(deg(i)-2), so the total change is α(∑ideq(i) – 2k) = α(2k– 2k) = 0.

Setting the p-values (III) Our heuristic scheme for updating α is the following 2-phase process. Phase 1 (Initializing α) • If MSTp’ > bestMST, set α = 2α. • Otherwise, assuming α > 1, setα = α/2 and switch to Phase 2. Phase 2 (Main work) • If MSTp’ > bestMST, leave α unchanged. • Otherwise, assuming α > 1, setα = α/2. (This was the scheme used in the code supporting the paper [Johnson, McGeoch, & Rothberg, “Asymptotic Experimental Analysis for the Held-Karp Traveling Salesman Bound, Proc. 7th Ann. ACM-SIAM Symp. on Discrete Algorithms, 1996, pp. 341-350].

An Aside This scheme was originally developed for finding lower bounds on the optimal tour, not a min cost fixed-ended Hamilton path. To see the relationship, simply consider the case where e1 =e2, and we add the edges from the merged city to the unfixed cities that have the first and second smallest pdist values. Then we get a lower bound on the minimum length Hamilton cycle for the union of the set C of unfixed cities and {e1}. This scheme can be used basically unchanged to compute a TSP lower bound on an arbitrary set C of cities by simply choosing some city to play the role of e1 and iterating as previously. It is a theorem of Held, Karp, et al. that there exists a scheme for adjusting α such that the process will converge to the solution to the LP relaxation of the TSP, no matter what choice we make for e1. [This is why the LP relaxation of the TSP is called the “Held-Karp bound”.] Unfortunately, as with the results saying that simulated annealing is guaranteed to find an optimal solution, the scheme for realizing convergence to the HK bound may take exponential time…

One more thing.. We can still hash solutions to avoid redundant computations, but now it is a bit more complicated. The bound doesn’t just depend on the overall set of cities involved, but also on the identities of e1 and e2. One of the two (city N-1) is always the same, so that is not an issue, but the other (the city in slot k) can differ, and so the hash element needs to be augmented to contain not only the bitmap for the set of cities, but also the identity of the latter city. And because of this added restriction, hashing is not nearly as effective. The median percentage of hash misses was around 40% on our testbed, versus something under 1% when we just used MST lower bounds. Fortunately, the new lower bounds are much more effective. (On average, an increase of about 12% above the MST bound.) At a cost of 70 extra lines of code (from 293 to 363).

Results

Next Speedup Maybe we were too conservative in increasing the upper bound target by 5% each time. What about 1%? (No extra lines of code!).

Last Speedup (for now) Why fool around with approximate upper bounds? In practice, for instances this small, the best of 100 runs of Lin-Kernighan is typically optimal. Let’s use that. (In a shell script that runs the Johnson-McGeoch implementation of LK, finds the best of 100, and inputs that value into our exhaustive search to be used as the start value for bestlen.) Adds less than 0.01 seconds to our running time, even for N = 150. However, implicitly increases total lines of code by a substantial factor. But it does let us know the limiting value of better upper bounds.

Results

More Results

More Instances

Random Distance Matrices Note: LK (100) returned the optimal solution value in all cases.

Summary * No others for N < 105, under 8000 seconds at least through N = 105 ** Under 8000 seconds for N < 149

Conclusions Relatively simple pruned exhaustive search over permutations can solve much bigger instances than one might naively expect. For the hardest instance type we considered (random distance matrices) we could still find optimal solutions in minutes when N ≤ 45, and for geometric instances, we could often get beyond N = 100. The Applegate-Cook-Dash implementation of the final version of the algorithm seems to do a bit better, perhaps due to its more sophisticated p-value scheme. This is true even though their LK implementation is less sophisticated, which enables them to include it directly in their program while keep the overall code length under 1000 lines. In particular, their code solved all the record-breaking instances that were solved before 1980 (all by LP-based methods). The 120-city Grötschel instance took their code 412 seconds. http://www.math.uwaterloo.ca/tsp/history/milestone.html

More Observations Constant factor (or even linear factor) running time improvements have only a minor effect on the sizes of instances solvable (although such improvements may add up). The major impact is pruning. Even a few percent improvement in the upper or lower bounds used in pruning can have a major impact. The more sophisticated the algorithm, the lest predictable its results. Adding a single city to an instance can cause an algorithm’s running time to balloon by a factor of almost 10, or shrink by a factor of almost 30. Unfortunately, the permutation-based approach does seem to be running out of gas, and something different is needed.

Another Approach • Branch-and-bound over sets of tour edges, not permutations. • (This week’s “pruned exhaustive search” approach can be viewed branch-and-bound over partial permutations.) • Lower bounds obtained by solving LP’s with a variety added inequalities (“cuts”). • To be continued next week.

The Traveling Salesman Problem in Theory & Practice