The Traveling Salesman Problem in Theory & Practice

The Traveling Salesman Problem in Theory & Practice Lecture 1 21 January 2014 David S. Johnson dstiflerj@gmail.com http://davidsjohnson.net Seeley Mudd 523, Tuesdays and Fridays

Today’s Outline • Requirements, References, & Introductions • Problem Definition • Applications • Paths and Cycles • Complexity • Introduction to Optimization • Introduction to Approximation • Preview of the Rest of the course

Requirements and Grading • Class presentation of results from the literature. • Written paper: • Survey paper on an approved topic • Report on your own new experimental work • Theoretical paper on new results of your own • Regular class participation.

About Me • Ph.D. in Mathematics from MIT (1973). Thesis: Near-Optimal Bin Packing Algorithms. • 40 years at AT&T (Bell Labs, AT&T Labs – Research), with one year off for good behavior (U. Wisconsin, 1980-81). • Most famous publication: Computers and Intractiability: A Guide to the Theory of NP-Completeness, (1979, with Mike Garey). • Many theoretical and experimental papers on the TSP with many co-authors, starting with the proof that the Euclidean version is NP-Hard.

Optional Reference Books The Traveling Salesman Problem, Lawler, Lenstra, RinnooyKan, and Shmoys (Editors), Wiley (1985). $377.47 (current amazon.com price, new) The Traveling Salesman Problem and Its Variations, Gutin and Punnen (Editors), Kluwer (2002). $152.10 The Traveling Salesman Problem: A Computational Study, Applegate, Bixby, Chvatal, and Cook, Princeton University Press (2006). $57.99/$44.99 (Kindle) In Pursuit of the Traveling Salesman, Cook, Princeton University Press (2012). $20.64/$15.37 (Kindle)

Web Resources • http://www.math.uwaterloo.ca/tsp/“The Traveling Salesman Problem” (Bill Cook) • http://dimacs.rutgers.edu/Challenges/TSP/“The 8th DIMACS Implementation Challenge: The Traveling Salesman Problem” (DSJ) • http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/“TSPLIB” (Testbed of Instances, GerdReinelt) • http://davidsjohnson.net/papers.html(DSJ’s downloadable papers on the TSP and other topics) • http://en.wikipedia.org/wiki/Travelling_salesman_problem(Wikipedia Entry -- Much Improved)

The Traveling Salesman Problem Given: Set of cities {c1,c2,…,cN}. For each pair of cities {ci,cj}, a distance d(ci,cj). Find: Permutation that minimizes

Alternative Definition Given: Graph G = (V,E) Length d(e) for each edge e in E. Find: Minimum length Hamiltonian Circuit in the complete graph G’ on V, where if {u,v} is not in E, we assume d(e) = ∞.

N = 10

N = 100

N = 1000

N = 10000

Jan KarelLenstra

Planar Euclidean Application #1 • Cities: • Holes to be drilled in printed circuit boards

N = 10000

N = 2392

Planar Euclidean Application #2 • Cities: • Wires to be cut in a “Laser Logic” programmable circuit

N = 7397

N = 33,810

N = 85,900

Other Types of Instances • X-ray crystallography • Cities: orientations of a crystal • Distances: time for motors to rotate the crystal from one orientation to the other • High-definition video compression • Cities: binary vectors of length 64 identifying the summands for a particular function • Distances: Hamming distance (the number of terms that need to be added/subtracted to get the next sum)

Data Storage Layout Goal: For each row, have as many consecutive entries as possible (minimizes the number of random accesses)

Asymmetric Applications • Payphone Money Collection with One-Way Streets • Stacker-Crane • No-Wait Flowshop • Disk Scheduling • Compiling to Minimize Branching Cost • Minimum Length Common Superstring

The Stacker Crane Problem

No-Wait Flowshop Task on Processor 1 Job: Task on Processor 2 Schedule: Processor 1 Processor 2

No-Wait Flowshop 2 2 1 1 1 1 3 3 6 5

Disk Scheduling

Disk Scheduling Locations of the fragments of a file one want to retrieve Distance between two fragments = time it takes to move the read head from the end of one to the beginning of the next, taking into account the spinning of the disk

Compiling to Minimize Branching Cost PB B A PC C Code Segment ending in a Branch In execution, the delay at the end of the segment is much less if the next instruction to be executed is the next one in the code, say 1 versus k. Based on profiling, one can determine the empirical probability that each branch is taken. Following A directly by B causes an expected delay of PB + kPC. Following A directly by C causes an expected delay of PC+ kPB. Following A directly by anything else causes an expected delay of k.

Shortest Superstring • Given: Finite set of S strings over some alphabet. • Find: Shortest string that contains all strings in S as substrings. • Cities: Strings in S. • Distances: d(x,y) = |y| - maximum overlap between a suffix of x and a prefix of y. X = “alphabet”, y =“ betrayal” d(x,y) = 5 alphabet betrayal d(y,x) = 6 betrayal alphabet

Hamiltonian Path versus Cycle • Four variants (both for symmetric and asymmetric TSP). • Cycle • Path between between fixed endpoints • Path with fixed starting vertex • Path with unconstrained endpoints. • A code for any one can be adapted to handle any of the others.

Path with Fixed Endpoints:Cycle via Path t s Call Path algorithm once for s and each vertex t in V-{s}. Return result with best value of Path Length + dist(t,s)

Path with Fixed Endpoints:Path via Cycle t s Add one new vertex and two new edges. Compute shortest cycle, then delete the added vertex and edges

Path with One Fixed Endpoint viaPath with Two Fixed Endpoints s For each t in V – {s}, find shortest Hamiltonian path from s to t. Return the best.

Path with Two Fixed Endpoints viaPath with One Fixed Endpoint t s t’ Add one new vertex t’ with an edge to t. The shortest Hamiltonian path starting with s must end at t’.

Path with No Fixed Endpoints viaPath with One Fixed Endpoint For each s in V, find shortest Hamiltonian path starting from s. Return the best.

Path with One Fixed Endpoint viaPath with No Fixed Endpoint s’ s Add new vertex s’ and an edge from s’ to s.

Directed viaUndirected Replace each vertex vi by a triplet of vertices viin, vi, viout, and edges {viin,vi} and {vi,viout} v1 v1in v1out v2 v2in v2out v3 v3in v3out  vN vNin vNout Replace each directed edge (vi,vj) by the undirected edge {viout,vjin}.

v2 v2in v2out v1 v3 v1in v3out v1out v3in v4 v4in v4out

TSP: The Canonical NP-Hard Problem? • Commonly used in the popular press to explain NP-completeness and exponential time to the layman: The number of tours grows as N! (actually (N-1)!/2 for symmetric case):

N! = Ω(2NlogN) time is not required O(N22N) suffices! [Bellman, 1963][Held & Karp, 1962] Algorithmic technique: Dynamic Programming States: Pairs [U,j] with 2 ≤ j ≤ N and {v1,vj} ⊆ U ⊆ V. Note: There are θ(N2N) states [U,j]. Values: X[U,j] is the length of the shortest Hamiltonian path, starting with v1 and ending with vj, in the subgraph of G induced by U. Note: The optimal tour length equals min {X[V,j] + d(vj,v1): 2 ≤ j ≤ N}.

Computing the Values X[U,j] X[{v1,vj},j] = d(v1,vj) , 2 ≤ j ≤ N. Now assume we already have computed X[U,j], 2 ≤ j ≤ N,for all U, {v1,vj} ⊆ U ⊆ V, with |U| = k. Let W be such thatv1∈ W ⊆ V and |W| = k+1. Suppose vi,i > 1, is in W. Then X[W,i] = min {X[W - {vi},j] + d(vj,vi): vj ∈ W - {vi}} Computation takes O(N) time for each state [W,i]. Since there are θ(N2N) states overall, this yields an overall running time of O(N22N).

Current World Record (2006) Using a parallelized version of the Concorde code, Helsgaun’ssophisticated variant on Iterated Lin-Kernighan, and 2719.5 cpu-days N = 85,900

Concorde • “Branch-and-Cut” approach exploiting linear programming to determine lower bounds on optimal tour length. • Based on 30+ years of theoretical developments in the “Mathematical Programming”community, plus some very good data structures and heuristics work from computer science. • For surprisingly large instances, it finds an optimal tour and proves its optimality (unless it runs out of time/space). • Executablesand source code can be downloaded from http://www.tsp.gatech.edu/

The Traveling Salesman Problem in Theory & Practice