Implicit Hitting Set Problems - PowerPoint PPT Presentation

Implicit hitting set problems
1 / 23

  • Uploaded on
  • Presentation posted in: General

Implicit Hitting Set Problems. Richard M. Karp Harvard University August 29, 2011. Worst-case Analysis of NP-Hard Problems. Exact solution methods: exponential running time in worst case.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Implicit Hitting Set Problems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Implicit hitting set problems

Implicit Hitting Set Problems

Richard M. Karp

Harvard University August 29, 2011

Worst case analysis of np hard problems

Worst-case Analysis of NP-Hard Problems

  • Exact solution methods: exponential running time in worst case.

  • Polynomial-time approximation algorithms for optimization problems. Approximation ratios are usually unrealistically high.

  • Parametrized complexity: polynomial-time complexity for instances with fixed parameter, but dependence on parameter is usually adverse.

Probabilistic analysis and heuristics

Probabilistic Analysis and Heuristics

  • In probabilistic analysis problem instances are drawn from simple probability distributions. Often one can prove excellent performance on the average. However, the probability distributions may not correspond to real-life instances.

  • Heuristics are often “unreasonably effective,” for reasons not well understood.

  • We seek systematic methods for tuning heuristics and

    validating them by empirical testing on training sets

    of representative instances.

Unreasonably effective heuristics

Unreasonably Effective Heuristics

  • Large traveling-salesman problems can be solved by quick tour construction methods, local improvement methods or cutting plane methods.

  • Local improvement methods find near-optimal solutions to graph bisection problems.

  • Huge satisfiability problems are routinely solved rapidly by branch-and-bound methods.

  • The greedy set cover algorithm typically gives solutions within a few percent of optimal.

Implicit optimization problems

Implicit Optimization Problems

  • Set of constraints defined implicitly by a generation algorithm rather than by an explicit list.

    -- Linear and convex programming: equivalence of separation and optimization

    -- Integer programming: cutting-plane methods

    -- Linear programming: column generation

Hitting set problem

Hitting Set Problem

  • Ground set V

  • For every v in V, a positive weight c(v).

  • C*: collection of subsets of V (circuits)

  • Goal: Find a set of minimum weight that hits every set in C*

  • Equivalent to set cover problem

Complexity of the hitting set problem

Complexity of the Hitting Set Problem

  • NP-hard and hard to approximate within ratio o(log | C*|).

  • Greedy algorithm achieves approximation ratio O(log | C*|):

    Repeat: Choose element v in V that minimizes ratio of c(v) to number of sets hit; Delete sets hit by v.

Hitting set problem in practice

Hitting Set Problem in Practice

  • Greedy algorithm gives good approximate solutions.

  • CPLEX integer programming algorithm often gives optimal solutions rapidly.

Implicit hitting set problem

Implicit Hitting Set Problem

  • The collection of circuits C* has a compact implicit description.

  • There is a polynomial-time separation oracle which, given a subset H of the ground set, either determines that H is a hitting set or produces a circuit that H does not hit.

    Example: in the feedback vertex set problem, the separation oracle produces vertex set of a shortest cycle in the subgraph induced by V\H.



  • Feedback vertex set in a graph or digraph: vertex sets of cycles

  • Feedback edge set in a digraph: edge sets of cycles

  • Max cut: edge sets of odd cycles

  • Steiner tree: edge sets of cycles that partition the required vertices

  • Maximum 2-sat: minimal contradictory sets of 2-element clauses

  • Intersection of k matroids: circuits of each matroid

  • Maximal feasible subset of set of linear inequalities; minimal infeasible subsets.

Na ve algorithm for solving implicit hitting set problem

Naïve Algorithm for Solving Implicit Hitting Set Problem

Repeat until a feasible hitting set His found:

(1) Given C, a subset of C*, find a minimum-weight hitting set Hfor C.

(2) Using the separation oracle, find a minimum-cardinality circuit c not hit by H.

(3) Add c to C

Return C

Circuit finding subroutine

Circuit-Finding Subroutine

Input: C, a set of circuits and H, a hitting set for C

Repeat until H hits every circuit in C*

find a circuit c not hit by H and choose an element x in c; add c to C and add x to H.

Refined algorithm

Refined Algorithm

  • Input: set of circuits C and hitting set H for C

    (1)Execute the circuit-finding subroutine

    (2) Repeat until k iterations yield no circuits: construct a greedy hitting set H for C and execute the circuit-finding subroutine.

    (3) Using CPLEX, construct an optimal hitting set H for C.

    If H is infeasible, go to (1)

    Return H.



  • Number of circuits generated, number of calls to solver, running time of generator.

Application multi genome alignment

Application: Multi-Genome Alignment

  • Highly similar sequences in two genomes constitute an anchor pair. The individual sequences are called anchors.

  • A genome is a linearly ordered sequence of anchors.

  • An alignment is a matrix with a row for each genome, and an assignment of each anchor to a column, respecting the linear orders.

  • An anchor pair is synchronized if its two anchors lie in the same column.

  • Goal: maximize the sum of the weights of the synchronized anchor pairs.

Complexity bounds

Complexity Bounds

  • The 2-genome problem is equivalent to the maximum-weight increasing subsequence problem and is solvable in time O(n log n), where n is the cardinality of the ground set. The k-genome problem can be solved in time O(nk) by dynamic programming.

Alignment as a hitting set problem

Alignment as a Hitting Set Problem

  • Ground set: anchor pairs

  • Goal: delete a minimum-weight set of anchor pairs such that the remaining anchor pairs can be simultaneously synchronized.

  • Directed edge (u,v): u precedes v .

  • undirected edge (u,v) : u and v are an anchor pair

  • Mixed cycle: contains directed and undirected edges, but at least one directed edge.

  • An edge must be deleted from the set of undirected edges of each mixed cycle (Kececioglu).

Solving the alignment problem

Solving the Alignment Problem

  • Run the generic implicit hitting set algorithm, with the elements as anchors and the undirected edge sets of mixed

    cycles as circuits.

  • Separation oracles: given a putative hitting set H, search for a mixed cycle in the graph induced by the edges not in H.

    Two methods:

    (1) a variant of depth-first search;

    (2) attempt to align the remaining edges until blocked by the occurrence of a mixed cycle.

Performance on 4085 problems of aligning five worm genome

Performance on 4085 Problems of Aligning Five Worm Genome

Time (sec.) # solved # edges

0 to 0.01 1311 (1; 52; 399)

0.01 to 0.1 764 (20; 203; 549)

0.1 to 1 1086 (26; 450; 1837)

1 to 10 632 (44; 1104; 4645)

10 to 60 151 (65; 1351; 12313)

60 to 600 75 (103; 1136; 14690)

600 to 3600 36 (166; 1236; 13916)

Tuning the algorithm

Tuning the Algorithm

  • Within the general algorithmic strategy there are many possible choices of the separation oracle, greedy algorithm, versions of CPLEX, parameter choices etc. By tuning these choices on a training set of real-world examples we improved the performance by a factor of several hundred.



  • This is joint work with Erick Moreno Centeno

  • Login