Knuth Prize Lecture 2010

Knuth Prize Lecture 2010 • David S. Johnson • AT&T Labs - Research

1975 Don Knuth Mike Garey David Johnson

From M.R. Garey, R. L. Graham, D. S. Johnson, and D.E. Knuth, “Complexity Results for Bandwidth Minimization,” SIAM J. App. Math. 34:3 (1978), 477-495.

Bob Tarjan Mike Garey David Johnson

1980’s Peter Shor Ed Coffman MihalisYannakakis Christos Papadmitriou Ron Graham Dick Karp EndreSzemeredi LaciLovasz

Acknowledgments • Role Models: Collaboration in Action • Mike Fischer and Albert Meyer • Collaborators “Down the Hall” • Mike Garey, Ron Graham, Ed Coffman, MihalisYannakakis, Bob Tarjan, Peter Shor • Honorary “Down the Hall” Collaborators • Christos Papadimitriou, Tom Leighton, Richard Weber, Claire Mathieu • Experimental Inspirations and Collaborators • Jon Bentley, Shen Lin & Brian Kernighan, Lyle & Cathy McGeogh, David Applegate

Approximation Algorithms in Theory and Practice David S. Johnson AT&T Labs – Research Knuth Prize Lecture June 7, 2010

The Lost Cartoon

Impact?

Kanellakis Theory and Practice Award 1996: Public Key Cryptography(Adleman, Diffie, Hellman, Merkle, Rivest, and Shamir) 1997: Data Compression (Lempel and Ziv) 1998: Model Checking (Bryant, Clarke, Emerson, and McMillan) 1999: Splay Trees (Sleator and Tarjan) 2000: Polynomial-Time Interior Point LP Methods (Karmarkar) 2001: Shotgun Genome Sequencing (Myers) 2002: Constrained Channel Coding (Franaszek) 2003: Randomized Primality Tests (Miller, Rabin, Solovay, and Strassen) 2004: AdaBoost Machine Learning Algorithm (Freund and Schapire) 2005: Formal Verification of Reactive Systems (Holzmann, Kurshan, Vardi, and Wolper) 2006: Logic Synthesis and Simulation of Electronic Systems(Brayton) 2007: Gröbner Bases as a Tool in Computer Algebra(Buchberger) 2008: Support Vector Machines (Cortes and Vapnik) 2009: Practice-Oriented Provable-Security (Bellare and Rogoway)

Coping with NP-Completeness at AT&T Part I. The Traveling Salesman Problem

TSP Applications (Bell Labs): • “Laser Logic” (programming FPGA’s) • Circuit Board Construction • Circuit Board Inspection • Algorithms Used -- • Double Spanning Tree? (worst-case ratio = 2) • Nearest Insertion? (worst-case ratio = 2) • Christofides? (worst-case ratio = 1.5) • Answer: None of the Above

Testbed: Random Euclidean Instances N = 10

N = 10

N = 100

N = 1000

N = 10000

Lin-Kernighan [Johnson-McGeoch Implementation] 1.5% off optimal 1,000,000 cities in 8 minutes at 500 Mhz Iterated Lin-Kernighan [Johnson-McGeoch Implementation] 0.4% off optimal 100,000 cities in 75 minutes at 500 Mhz Concorde Branch-and-Cut Optimization [Applegate-Bixby-Chvatal-Cook] Optimum 1,000 cities in median time 5 minutes at 2.66 Ghz

Running times (in seconds) for 10,000 Concorde runs on random 1000-city planar Euclidean instances (2.66 Ghz Intel Xeon processor in dual-processor PC, purchased late 2002). Range: 7.1 seconds to 38.3 hours

For more on the state-of-the-TSP-art, see http://www2.research.att.com/~dsj/chtsp/index.html/ [DIMACS TSP Challenge] http://www.tsp.gatech.edu/ [Concorde, with instances]

Coping with NP-Completeness at AT&TPart II. Bin Packing

Coping with NP-Completeness at AT&TPart III. Access Network Design[Applegate, Archer, Johnson, Merritt, Phillips, …]

Problem:In “out of region” areas, AT&T does not always have direct fiber connections to our business customers, and hence spends a lot of money to lease lines to reach them. Can we save money by laying our own fiber? • Tradeoff:Capital cost of fiber installation versus monthly cost savings from dropping leases. • Our Task: Identify the most profitable clusters of customers to fiber up.

Key Observation: This can be modeled as a Prize Collecting Steiner Tree problem, with Prize = Lease Savings and Cost = Annualized Capital Cost. • The Goemans-Williamson primal-dual approximation PCST algorithm should be applicable.

Unfortunate Details • Although the Goemans-Williamson algorithm has a worst-case ratio of 2, this is for the objective function Edge Cost + Amount of Prize Foregone which isn’t really the correct one here. • Edge costs are capital dollars, prizes are expense dollars and not strictly comparable. • We don’t have accurate estimates of costs.

Fortunate Details • By using various multipliers on the prize values, we can generate a range of possible clusters, ranking them for instance by the number of years until cumulative lease savings equals capital cost. • Each cluster can itself yield more options if we consider peeling off the least profitable leaves. • Planners can then take our top suggestions and validate them by obtaining accurate cost estimates.

Coping with NP-Completeness at AT&TPart IV. The More Typical Approaches • Adapt a metaheuristical search-based approach (local search, genetic algorithms, tabu search, GRASP, etc.) • Model as a mixed integer program and use CPLEX, either to solve the MIP if the instance is sufficiently small (often the case), or to solve the LP, which we then round.

Final AT&T Example: Facility Location for Network Monitoring Host Location for Robust Content Distribution • Special case of the “Cover-by-Pairs” problem [Hassin & Segev, 2005]: • Given a set A of items, a set C of “cover objects”, and a set T  AxCxC, find a minimum-size subset C’  C such that for all a  A, there exist (not-necessarily-distinct) c,c’  C’ such that (a,c,c’)  T. • Here we are given a graph G = (V,E),with both A and C being subsets of V. [Breslau, Diakonikolas, Duffield, Gu, Hajiaghayi, Karloff, Johnson, Resende, Sen]

Cover Object (potential content location) Item (customer for content) Yes No (a,c,c’)  T iff no vertex b  a is in both a shortest path from a to c and a shortest path from a to c’.

What Theory Tells Us • Our special case is at least as hard to approximate as Cover-by-Pairs. • Cover-by-Pairs is at least as hard to approximate as Label Cover. • Assuming NP  DTIME(nO(polylog(n))),no polynomial-time approximation algorithm for Label Cover can be guaranteed to find a solution that is within a ratio of 2log1-εnof optimal for any ε > 0.

What Practice Tells Us Algorithms we tried: • CPLEX applied to the integer programming formulation of the corresponding Cover-by-Pairs instance • Greedy algorithm for the Cover-by-Pairs instance • Genetic algorithm for the Covers-by-Pairs instance • Graph-based “Double Hitting Set” Algorithm (HH) that puts together solutions to two specially-constructed hitting-set instances, with Greedy algorithm cleanup

Instance Testbed • Actual ISP networks with from 100 to 1000 routers (vertices) • Synthetic wide-area-network structures from 26 to 556 routers, generated using the Georgia Tech Internet Topology Models package.

Computational Results • CPLEX could, in reasonable time, find optimal integer solutions to instances with |A|,|C| < 150, but its running time was clearly growing exponentially. • The Double Hitting Set and Genetic algorithms typically found solutions that of size no more than 1.05 OPT • (where “OPT” was the maximum of the true optimum, where known, and a lower bound equaling the optimal solution value for the second hitting set instance the Double Hitting Set algorithm considered)

Only for the largest ISP instance did the results degrade (HH was 46% off the lower bound) • But is this degradation of the algorithm or the quality of our lower bound? • And does it matter? The solution was still far better than the naïve solution and well worth obtaining.

Lessons Learned • Real world instances were not as worst-case or asymptotic as our theory is. • Champion algorithms from the theory world could be outclassed by ad hoc algorithms with much worse (or unknown) worst-case behavior. • Some algorithms and ideas from the theory world have been successfully applied, often to purposes for which they were not originally designed. • Algorithms from the Operations Research and Metaheuristic communities have perhaps had more real-world impact on coping with NP-hardness than those from TCS.

How to Have More “Real-World” Impact (Today’s Sermon, with Illustrations) • Study problems people might actually want to solve. • Study the algorithms people actually use (or might consider using). • Design for users, not adversaries. • Complement worst-case results with “realistic” average case results. • Implement and experiment.

Knuth Prize Lecture 2010

Knuth Prize Lecture 2010

Presentation Transcript

Prize winners Eska Frog Award 2010

Knuth-Morris-Pratt Algorithm

Knuth-Morris-Pratt

Prize Motivation:

DONALD KNUTH

PRIZE WINNERS

Donald Knuth

2010 Nobel Prize in Medicine or Physiology

Group prize

Prize Draw

2002 Agilent Technologies Europhysics Prize Lecture on

Knuth-Morris-Pratt algoritmus (KMP)

The Reiter Prize Lecture

Knuth-Morris-Pratt Algorithm

Knuth-Morris-Pratt

Knuth-Morris-Pratt

Kevin Knuth on Measuring

“The Prize”

Donald E. Knuth (1938---)

Knuth-Morris-Pratt Algorithm

Hongkong Prize

Hongkong prize