1 / 56

CSCI 3130: Formal languages and automata theory - PowerPoint PPT Presentation

Fall 2010. The Chinese University of Hong Kong. CSCI 3130: Formal languages and automata theory. NP and NP-completeness. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Some more problems. A clique is a subset of vertices that are all interconnected. 2. 1.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'CSCI 3130: Formal languages and automata theory ' - annice

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

The Chinese University of Hong Kong

CSCI 3130: Formal languages and automata theory

NP and NP-completeness

Andrej Bogdanov

http://www.cse.cuhk.edu.hk/~andrejb/csc3130

A clique is a subset of vertices

that are all interconnected

2

1

{1, 4}, {2, 3, 4}, {1} are cliques

An independent set is a subset of

vertices so that no pair is connected

4

3

{1, 2}, {1, 3}, {4} are independent sets

there is no independent set of size 3

Graph G

A vertex cover is a set of vertices

that touches (covers) all edges

{2, 4}, {3, 4}, {1, 2, 3} are vertex covers

• A boolean formula is an expression made up of variables, ands, ors, and negations, like

• The formula is satisfiable if one can assign values to the variables so the expression evaluates to true

(x1∨x2 ) ∧ (x2∨x3∨x4) ∧ (x1)

Above formula is satisfiable because this assignment

makes it true:

x1 = Fx2 = Fx3 = T x4 = T

SAT = {〈f〉: fis a satisfiable Boolean formula}

3SAT = {〈f〉: fis a satisfiable Boolean formula in conjunctive normal form with 3 literals per clause}

literal: xi or xi

(x1∨x2∨x2 ) ∧ (x2∨x3∨x4)

clause

literals

CNF: AND of ORs of literals

(conjunctive normal form)

3CNF: CNF with 3 literals per clause

(repetitions are allowed)

CLIQUE = {〈G, k〉: G is a graph with a clique of k vertices}

IS = {〈G, k〉: G is a graph with an independent set of k vertices}

VC = {〈G, k〉: G is a graph with a vertex cover of k vertices}

SAT = {〈f〉: f is a satisfiable Boolean formula}

problem

CLIQUE

3SAT

VC

IS

2O(n)

2O(n)

running time

of best-known algorithm

2O(n)

2O(n)

What do these problems have in common?

• We don’t know how to solve them efficiently

• But if someone told us the solution, we would be able to verify it very quickly

Example:

12

Is (G, 5) in CLIQUE?

9

1,5,9,12,14

1

14

5

13

6

15

2

7

4

10

8

3

11

(x1∨x2 ) ∧ (x2∨x3∨x4) ∧ (x1)

f =

Finding a solution:

Verifying a solution:

FFTT

Try all possible assignments

FFFF

TFFF

substitute

FTFF

TTFF

FFFT

FTFT

TFFT

TTFT

x1 = F x2 = Fx3 = T x4 = T

FFTF

FTTF

TFTF

TTTF

FFTT

FTTT

TFTT

TTTT

evaluate formula

(F ∨T) ∧ (F∨T∨F) ∧ (T)

f =

For n variables, there are 2n

possible assignments

can be done in linear time

Takes exponential time

• A verifier for L is a TM V such that

• s is a potential solution for x

• We say V runs in polynomial time if on every input x, it runs in time polynomial in |x| (for every s)

x ∈ L

V accepts 〈x, s〉 for some s

NP is the class of all languages that have polynomial-time verifiers

3SAT is in NP:

V :=

On input a 3CNF f

and a candidate assignment a,

If a satisfies f accept, otherwise reject.

running time = O(m + n)

m= number of clauses and n = number of variables

CLIQUE is in NP:

V :=

On input 〈G, k〉

and a set of vertices C,

If C has size k and all edges between vertices of

C are present in G, accept, otherwise reject.

running time = O(n2)

because the verifier can ignore the solution

Conceptually, finding solutions can only be harder than checking them

P is contained in NP

decidable

NP (efficiently checkable)

IS

SAT

VC

CLIQUE

P (efficient)

PATH

L01

• Recall how in 1900, Hilbert gave 23 problems that guided mathematics in the 20th century

• In 2000, the Clay Mathematical Institute gave 7 problems for the 21st century

computer science

1 P versus NP

2 The Hodge conjecture

3 The Poincaré conjecture

4 The Riemann hypothesis

5 Yang–Mills existence and mass gap

6 Navier–Stokes existence and smoothness

7 The Birch and Swinnerton-Dyer conjecture

Perelman 2006 (refused money)

Hilbert’s 8th problem

\$1,000,000

• The answer to the questionis not known. But one reason it is believed to be negative is because, intuitively, searching is harder than verifying

• For example, solving homework problems (searching for solutions) is harder than grading (verifying the solution is correct)

Is P equal to NP?

\$1,000,000

Mathematician: Given a mathematical claim, come up with a proof for it.

Scientist: Given a collection of data on some phenomena, ﬁnd a theory explaining it.

Engineer: Given a set of constraints (on cost, physical laws, etc.) come up with a design (of an engine, bridge, etc.) which meets them.

Detective:

Given the crime scene, ﬁnd “who’s done it”.

P =

languages that can be decided on a TM

with polynomial running time

languages whose solutions can be verified

on a TM with polynomial running time

NP =

(solutions can be checked efficiently)

decidable

We believe that NP is bigger than P,

but we are not 100% sure

NP

P

• These (and many others) are in NP

• Their solutions, once found, are easy to verify

• But no efficient algorithms are known for any of them

• The fastest known programs take time ≈2n

CLIQUE = {〈G, k〉: G is a graph with a clique of k vertices}

IS = {〈G, k〉: G is a graph with an independent set of k vertices}

VC = {〈G, k〉: G is a graph with a vertex cover of k vertices}

SAT = {〈f〉: fis a satisfiable Boolean formula}

• We strongly suspect that problems like CLIQUE, SAT, etc. require time ≈2n to solve

• We do not know how to prove this, but what we can prove is that

If any one of them can be solved on a polynomial-time TM, then all of them can be solved

• All these problems are as hard as one another

• Moreover, they are at the “frontier” of NP

• They are at least as hard as any problem in NP

NP

clique

independent set

vertex-cover

P

satisfiability

How to show that a problem R is not easier than a problem Q?

Informally, if R can be solved efficiently, we can solve Q efficiently.

• Formally, we say Q polynomially reduces to R if:

• Given an instance q of problem Q

• There is a polynomial time transformation to an instance f(q) of R

• q is a “yes” instance if and only if f(q) is a “yes” instance

Then, if R is polynomial time solvable, then Q is polynomial time solvable.

If Q is not polynomial time solvable, then R is not polynomial time solvable.

• What do we mean when we say, for example,

• We mean that

• Or, we can convert any polynomial-time TM for IS into one for CLIQUE

“IS is at least as hard as CLIQUE”

If CLIQUE has no polynomial-time TM, then

neither does IS

IS = {〈G, k〉: G is a graph with an independent set

of k vertices}

• Theorem

CLIQUE = {〈G, k〉: G is a graph with a clique of k vertices}

2

1

If IS has a polynomial-time TM,

so does CLIQUE

4

3

{1, 4}, {2, 3, 4}, {1} are

independent sets

{1, 2}, {1, 3}, {4} are cliques

• Proof: Suppose IS has an poly-time TM A

• We want to use it to solve CLIQUE

If IS has a polynomial-time TM, so does CLIQUE

A

for IS

accept

ifG has clique of size k

〈G’, k’ 〉

〈G, k〉

reject

if not

accept

ifG’ has IS of size k

reject

if not

• We look for a polynomial-time TM R that turns the question:

into:

R

“Does G have a clique of size k?”

G, k

G’, k’

“Does G’ have an IS of size k’?”

2

2

1

1

flip all edges

G

G’

4

4

3

3

k’ = k

cliques of size k

ISs of size k’

On input 〈G, k〉

Construct G’ by flipping all edges of G

Set k’ = k

Output 〈G’, k’〉

R

G, k

G’, k’

cliques in G

independent sets in G’

If G’ has an IS of size k,

then G has a clique of size k

If G’ does not have an IS of size k,

then G has no clique of size k

• We showed thatby converting an imaginary TM for IS into one for CLIQUE

• To do this, we came up with a reduction that transforms instances of CLIQUE into ones of IS

If IS has a polynomial-time TM, so does CLIQUE

• Language Lpolynomial-time reduces to L’ if there exists a polynomial-time TM R that takes an instance x of L into instance y of L’ s.t.

x ∈ L if and only if y ∈ L’

L

(CLIQUE)

L’

(IS)

R

= 〈G, k〉

x

y

= 〈G’, k’〉

x ∈ L

y ∈ L’

(G has clique of size k)

(G’ has IS of size k’)

• Saying L reduces to L’ means L is no harder than L’

• In other words, if we can solve L’, then we can also solve L

• Therefore

If Lreduces toL’and L’∈P, then L∈P

acc

R

poly-time TM for L’

x

y

rej

x ∈ L

y ∈ L’

TM accepts

• The direction of the reduction is very important

• Saying “A is easier than B” and “B is easier than A” mean different things

• However, it is possible that L reduces to L’andL’ reduces to L

• This means that L and L’ are as hard as one another

• For example, IS and CLIQUE reduce to one another

(x1∨x2 ) ∧ (x2∨x3∨x4) ∧ (x1)

• So every problem in NP is easier than SAT

• But SAT itself is in NP, so SAT must be the “hardest problem” in NP:

Every L∈NP reduces to SAT

SAT = {f: fis a satisfiable Boolean formula}

E.g.

SAT

NP

P

If SAT∈P, then P = NP

• A language C is NP-complete if:

• Cook-Levin Theorem:

1. C is in NP, and

2.For every L in NP, L reduces to C.

C

NP

3SAT is NP-complete

P

NP-complete

A

B

SAT

A reduces to B

NP

IS

CLIQUE

P

any CFL

PATH

0n1n

NP-complete

A

B

3SAT

A reduces to B

IS

CLIQUE

NP

P

PATH

0n1n

In practice, most of the NP-problems are

either in P (easy) or NP-complete (probably hard)

• Optimistic view:

• Pessimistic view:

If we manage to solve SAT, then wecan also solve

CLIQUE, scheduling, and almost anything

Since we do not believe P = NP, it is unlikely that

we will ever have a fast algorithm for SAT

• Theorem

CLIQUE = {(G, k): G is a graph with a clique of k vertices}

CLIQUE is NP-hard

VC

IS

A clique is a subset of vertices so that all pairs are connected

2

1

CLIQUE

3SAT

{1, 2, 3}, {1, 4}, {4} are cliques

SAT

4

3

Reducing 3SAT to CLIQUE

• Proof: We give a reduction from 3SAT to CLIQUE

3SAT = {f: f is a satisfiable Boolean formula in 3CNF}

CLIQUE = {(G, k): G is a graph with a clique of k vertices}

R

3CNF formula f

(G, k)

G has a cliqueof size k

f is satisfiable

Reducing 3SAT to CLIQUE

• Example: f =

(x1∨x1∨x2 ) ∧ (x1∨x2∨x2) ∧ (x1∨x2∨x3)

x1

x1

x1

x1

x2

x2

x2

x2

x3

Put a vertex for every literal

Put an edge for every consistent pair

Reducing 3SAT to CLIQUE

R

3CNF formula f

(G, k)

R:

On input f, where f is a 3CNF formula with m clauses

Construct the following graph G:

G has 3m vertices, divided into m groups,

one for each literal in f

If a and b are in different groups and a ≠ b,

put an edge (a, b)

Output(G, m)

Reducing 3SAT to CLIQUE

R

3CNF formula f

(G, m)

G has a cliqueof size m

f is satisfiable

x1

x1

x1

x1

x2

x2

x2

x2

x3

f =(x1∨x1∨x2 ) ∧ (x1∨x2∨x2) ∧ (x1∨x2∨x3)

T

T

F

F

F

T

F

F

T

Reducing 3SAT to CLIQUE

R

3CNF formula f

(G, m)

G has a cliqueof size m

f is satisfiable

x1

x1

x1

x1

x2

x2

x2

x2

x3

f =(x1∨x1∨x2 ) ∧ (x1∨x2∨x2) ∧ (x1∨x2∨x3)

F

F

T

T

F

F

T

T

T

Reducing 3SAT to CLIQUE

• Every satisfying assignment of f gives a clique of size m in G

• Conversely, every clique of size m in Ggives a consistent satisfying assignment of f.

R

3CNF formula f

(G, m)

f is satisfiable

G has a clique of size m

VC

IS

CLIQUE

3SAT

SAT

• Theorem

VC = {(G, k): G is a graph with a vertex cover of size k}

2

1

4

3

VC is NP-hard

VC

IS

A vertex cover is a set of vertices that touches (covers) all edges

CLIQUE

3SAT

{2, 4}, {3, 4}, {1, 2, 3}

are vertex covers

SAT

Reducing CLIQUE to VC

• Proof: We describe a reduction from IS to VC

• Example

2

1

R

(G, k)

(G’, k’)

4

3

G has an IS of size k

G’ has a VC of size k’

vertex covers

independent sets

{2, 4}, {3, 4},

{1, 2, 3}, {1, 2, 4},

{1, 3, 4}, {2, 3, 4},

{1, 2, 3, 4}

∅, {1}, {2}, {3}, {4},

{1, 2}, {1, 3}

Reducing IS to VC

• Claim

• Proof

2

1

S is an independent set of G if and

only if S is a vertex cover of G

4

3

VC

IS

{1}

{2}

{3}

{4}

{1, 2}

{1, 3}

{2, 4}

{3, 4}

{1, 2, 3}

{1, 2, 4}

{1, 3, 4}

{2, 3, 4}

{1, 2, 3, 4}

S is an independent set of G

no edge has both endpoints in S

every edge has an endpoint in S

S is a vertex cover of G

Reducing IS to VC

VC

R

(G, k)

(G’, k’)

IS

CLIQUE

R:

On input (G, k),

Output (G, n – k).

3SAT

SAT

G has an IS of size k

G has a VC of size n – k

• We saw a few examples of NP-complete problems, but there are many more

• A surprising fact of life is that most CS problems are either in P or NP-complete

• A 1979 book by Garey and Johnsonlists 100+ NP-complete problems

Instance: A set X and a size s(x) for each x in X.

Question: Is there a subset X’ X such that

PARTITION

SUBSET-SUM

Instance: A set X and a size s(x) for each x in X, and an integer B.

Question: Is there a subset X’ X such that

HAMILTONIAN CYCLE

Instance: A graph G=(V,E).

Question: Does G contains a Hamiltonian cycle,

i.e. a cycle which visits every vertex exactly once?

HAMILTONIAN PATH

Instance: A graph G=(V,E).

Question: Does G contains a Hamiltonian path,

i.e. a path which visits every vertex exactly once?

• Restriction

• Show that a special case is already NP-complete.

• Local replacement

• Replace each basic unit by a different structure.

• Component design

• Design “components” with specific functionality.

Two graphs G1 = (V1,E1) and G2 = (V2,E2) are isomorphic if

bijectionf: V1 →V2

u —v inE1iff f (u)—f (v)inE2

Instance: Two graphs G = (V1,E1) and H = (V2,E2).

Question: Does G contain a subgraph isomorphic to H?

Clique <= Subgraph Isomorphism

A spanning tree is a connected subgraph with |V|-1 edges.

Instance: A graph G=(V,E) and a positive integer k.

Question: Is there a spanning tree for G in which no vertex has degree > k?

Hamiltonian path <= Bounded degree spanning tree

Instance: Collection C of subsets of a set S, and a positive integer k.

Question: Does C contains a cover for S of size k or less, that is,

a subset C’ C with |C’| <= k and ?

Vertex cover <= Minimum Cover

Instance: A graph G and a positive integer k.

Question: Does there exist a subset S of at most k vertices such that

every vertex in V-S is adjacent to at least one vertex in S?

Vertex cover <= dominating set

Instance: A set T of jobs, each has a release time r(t),

a deadline d(t) and a length l(t).

Question: Does there exist a feasible schedule for T?

Partition <= Sequencing within Intervals

Instance: A set X and a size s(x) for each x in X, and an integer B.

Question: Is there a subset X’ X such that

Vertex cover <= subset sum

See this proof and many other problems and reductions from Prof. Cai notes:

http://www.cse.cuhk.edu.hk/~csci3160/LectureNotes/11notes.pdf