The Probabilistic Method in ECE559: Techniques for Constructing Proofs

The probabilistic method Class presentation for ECE559 -Ramakrishna Gummadi

Technique for constructing proofs • Define a probability space on a collection of objects • Probability that a certain property holds is non zero implies there exists atleast one object with that property

Outline • Examples • The probability space defined in the proof also suggests efficient randomized algorithms; derandomization • Symmetric Lovasz local lemma (LLL) • An algorithmic application of LLL

Example: Monochromatic cliques • Complete graph on n vertices. • When is it possible to color the edges with red or blue so as to avoid monochromatic k –cliques? • Objects: Colorings • Probability space: Color each edge uniformly at random • We see that for (n,k) satisfying nCk 2-(kC2+1) <1, this is always possible

Algorithm? • Sample colorings using the probability space we used in proof till we get what we want. • If we get our required coloring with probability p >0 at each instance, number of independent samplings required ~ Geometric R.V. with expected value 1/p

Expectation argument If a R.V. on the probability space we defined has mean m, atleast one instance has its value atleast m (and atleast one has atmost m)

Example: Finding a large cut in a graph • Consider: A Graph with m edges • Objects: Cuts • Probability space: Place each vertex (uniformly) at random in one of two sets • R.V: Number of edges in the cut • Has mean m/2. • So, we have proved that max cut is always >= m/2

Algorithm to find a cut of size >= m/2? • Sample cuts according to the p. space in proof till we get one that’s good enough • Provable efficiency? • The value of a cut is upper bounded by m. => Can get a lower bound on probability that a cut is good enough => An upper bound on expected number of samplings needed by the algorithm

Proof of channel coding theorem is in the same spirit with: Objects: codes Probability space to use on the codes? (Trivial ones don’t work this time) Sample codes according to the distribution on X that maximizes I(X;Y) for a channel (X, Y) (Above distribution weighs good codes heavily to show what we want)

Derandomization • Can convert the previous randomized algo for finding a cut of size >= m/2 to a deterministic one. • Sequentially make deterministic choices on which set to place the next vertex that maximizes the conditional expected cut size => Clearly, would do atleast as good as average, i.e. cut of size atleast m/2. (Mitzenmacher & upfal, page 131)

Sample and modify Basically, a method for visualizing clever sampling by interpreting as sampling in a simple way followed by a modification.

Example:Independent Sets(pg133) • G(n vertices, m edges) has an independent set of size n2/4m ( assume m>=n/2) • Probability space on G’s independent sets? • One way to generate independent sets: Just delete each edge and an arbitrary vertex it is incident on? • Above procedure too weak to prove what we need. (Reason is that for dense graphs, it gives independent sets that are too small)

Solution? “Sample & Modify” • Better way to generate an independent set: • Let d be the average degree (2m/n) >=1 • First, delete each vertex independently with probability 1-1/d (Makes sure graph isn’t too dense) • Now do what we tried earlier: delete each remaining edge and an arbitrary vertex? • Expected size of the resultant independent set turns out to be what we want, n2/4m.

Another example of this procedure to prove that there exist dense graphs with large girth on pg 134

Second moment method • X a non-neg integer valued R.V., then Pr(X = 0) <= Var(X)/ (E[X])2 Example application: Probability that a random graph in G(n,p) has a 4-clique as n-> inf ? Threshold behaviour with p ~ O(n -2/3) Above technique can be used to show that for p more than above threshold we get a 4-clique with high probability

Lovasz local lemma • Call E1, E2, … ,En bad events. Would like to show that its ‘not all bad’ (i.e. there is a non-zero probability that none of these events occur) • Suppose Ei have probabilities <= p < 1 • If they were all independent, easy to see that we have what we want - ‘not all bad’ • Even limited dependencies can be tackled using LLL

LLL Suppose: • For all i, P(Ei) < p • The degree of the dependency graph of the events is bounded by d. • 4dp <=1 • Then, there is a non zero probability that none of these events occur.

Dependency graph • A graph with each event as a vertex and edges between them such that: • The event Ei is mutually independent of the set of all events it does not share an edge with. • Where mutual independence of an event with a set of events is defined as ~ Mutual independence of the event with the intersection of events in each of the subsets.

Proof of LLL: Discussed in class exactly as in Mitzenmacher and Upfal pg 139 • Application examples: Edge Disjoint paths, k-SAT with a restriction on the number of clauses in which a variable can occur (the restriction is a function of k)

Algorithmic use of LLL (example ) K-SAT problem; m clauses; Each variable appears in at most T = 2ck clauses (c to be given during proof) • A polynomial (in m) expected time algorithm (will not be poly in k, though) The algo proceeds in 2 phases: * A subset of the variables are assigned random values in phase 1 * Remaining variables tackled in 2nd phase

Algorithmic use of LLL • Using LLL, one can show that the random 1st phase assignment can be extended to a complete solution. • An exhaustive search in the second phase is easy with high probability • Consider: G - a graph of the clauses with an edge connecting two clauses when they share atleast one variable.

Consider variables sequentially for the first phase assignment • Call clause Ci is dangerous iff • k/2 literals have been assigned • Ci is still unsatisfied • Start assigning variables randomly, freezing all the variables in clauses that become dangerous in the process. • Phase 1 ends when we considered all variables

A clause ‘survived’ if it is still unsatisfied after phase 1. (Can ignore the rest) • ‘Survived’ => it is dangerous or has a dangerous neighbor (freezing its variable) • Can prove using LLL that there exists a satisfying assignment to the deferred variables. • Let G1 be the survived clause graph - Atleast k/2 unspecified variables in each clause, and each clause has at most T=2ck neighbors. LLL => there exists a satisfying assignment to the deferred variables.

In G1 ,all connected components are of size O(log m) with high probability, so the exhaustive second phase search is polynomial with high probability. Proof outline: • We are basically trying to prove that under the survival scheme of our algorithm its not very likely that a large connected component survives. • Clause survival is more or less independent ***precisely speaking, when 2 clauses are at a distance atleast 4 (…. i.e. when even their neighbors don’t share any variable), their survivals are independent events.

What is the probability that a clause survives? (i.e. either the clause or one of its neighbors is dangerous) • A given clause is dangerous with probability less than 2-k/2 since exactly k/2 of its variables were given random values. • So, probability that a given clause survives is at most (d+1) 2-k/2 where d is the maximum number of neighbors = k T with T= 2ck • So, a given set of r clauses in G with pairwise distances atleast 4 survives with probability at most ((d+1) 2-k/2)r (due to the independence)

But if G1 were to have a surviving connected component of size L, then we could modify it to exhibit a component of size L/d3 with the following properties 1: All elements are at pairwise distance atleast 4. 2: It is connected in G4, the graph obtained by connecting two clauses iff their distance is exactly 4. (Note that G4 has a degree bound of d4)

But there can’t be too many connected components of size ‘r’ in G4. More specifically, it can be bounded as md8r using the fact that the degree of G4 is at most d4 (In fact, this is a very loose bound particularly when r is too small or too large) • So, the probability that such a set of r clauses exists in G1 is at most md8r((d+1) 2-k/2)r

For r = L/d3 note that this is very unlikely using the last line of previous slide if L= C log m and C large enough! • So, we have with high probability that all connected components are O(log m). So, each component only takes poly(m) time for exhaustive search.

References • Mitzenmacher and Upfal chapter 6 • For the last part on algorithmic application of LLL the following is also useful: http://valis.cs.uiuc.edu/~sariel/teach/2002/a/notes/13_prob_method_4.pdf

The Probabilistic Method in ECE559: Techniques for Constructing Proofs