Slides by Elery Pfeffer and Elad Hazan, Based on slides by Michael Lewin & Robert Sayegh.

Derandomizing Space Bound Computations Slides by Elery Pfeffer and Elad Hazan, Based on slides by Michael Lewin & Robert Sayegh. Adapted from Oded Goldreich’s course lecture notes by Eilon Reshef.

Introduction In this lecture we will show our main result: BPSPACE(log n)  DSPACE(log2n) The result will be derived in the following order: • Formal Definitions • Execution Graph Representation • A Pseudorandom Generator based on UHF. • Analysis of the Execution Graph traversal by the Pseudorandom Generator.

Definition of BPSPACE(·) 16.2 BPSPACE(·) is the family of bounded probability space-bounded complexity classes. Def: The complexity class BPSPACE(s(·)) is the set of all languages L s.t. there exists a randomized TM M that on input x: • M uses at most s(|x|) space • The running time of M is bounded by exp(s(|x|)) • xL  Pr[M(x) = 1]  2/3 • xL  Pr[M(x) = 1]  1/3 s(·) is any complexity function s.t. s(·)  log(·). We focus on BPL, namely, BPSPACE(log) Without this condition:NSPACE(·) = RSPACE(·)  BPSPACE(·)

Execution Graphs 16.3 We represent the execution of a BPSPACE machine M on input x as a layered directed graph GM,x • The vertices in the i-th layer ( ) correspond to all the possible configurations of M after it has used i random bits. • Each vertex has 2 outgoing edges corresponding to reading a “0” or “1” bit from the random tape. Note: • Width ofGM,x = | |  2s(n) · s(n) · n  exp(s(n)) • Depth ofGM,x = # of layers  exp(s(n))

Execution Graph Example Width Depth

Execution Graph Definitions The set of final vertices are partitioned into: • Vacc - The set of accepting configurations. • Vrej - The set of rejecting configurations. A random walk on GM,x is a sequence of steps emanating from the initial configuration and traversing randomly the directed edges of GM,x. A guided walk on GM,x (with guide R) is a sequence of steps emanating from the initial configuration and traversing the i-th edge in GM,x according to the i-th bit given by the guide R.

Execution Graph Definitions Denote by ACC(GM,x,R) the event that the R guided walk reaches a vertex in Vacc. Thus, Pr[M accepts x] = Pr[ACC(GM,x,R)] Summarizing, we learn that for a language L in BPL there exists a (D,W)-graph s.t. • Width- W(n) = exp(s(n)) = poly(n). • Depth- D(n) = exp(s(n)) = poly(n). • For a random guide R: • xL  PrR [ACC(GM,x,R)]  2/3. • xL  PrR [ACC(GM,x,R)]  1/3.

Execution Graph Definitions We note that the following technical step preserves a random walk on GM,x pruning the layers of GM,x s.t. only every l-th layer remains, contracting edges when necessary. Denote the new pruned graph as G. This is done to ease the analysis of the pseudorandom generator further on.

0101…110 1100…010 2l l 0 Execution Graph Definitions Clearly, a random walk on GM,x is equivalent to random walk on G.

Universal Hash Functions 16.4 Def: A family of functions H = {h: AB} is called a universal family of hash functions if for every x1 and x2 in A, x1x2, PrhH[h(x1) = y1 and h(x2) = y2] = (1/|B|2) We will use a linear universal family of hash functions seen previously: Hl = {h: {0,1}l {0,1}l} ha,b = ax + b This family has a succinct representation ( 2l space ) and can be computed in linear space.

Universal Hash Functions For every A  {0,1}l denote by m(A) the probability that a random element hits the set A: m(A) = |A| / 2l Hash Proposition: For every universal family of hash functions, Hl, and for every two sets A,B  {0,1}l, all but a 2-l/5 fraction of functions hHl satisfy |Prx[x  A and h(x)  B] - m(A) · m(B)|  2-l/5 That is, a large fraction of the hash functions extend well to the sets A and B.

Construction Overview 16.5 Def: A function H: {0,1}k {0,1}D is a (D,W)-pseudorandom generator if for every (D,W)-graph G: |PrR{0,1}D[ACC(G,R)] - PrR’{0,1}k[ACC(G,H(R’)]| 1/10 Prop: There exists a (D,W)-pseudorandom generatorH(·) with k(n)=O(logD·logW). Futher, H(·) is computable in space linear on its input.

Construction Overview Corollary: There exists a (D,W)-pseudorandom generatorH(·) with the following parameters: • s(n) = (log n) • D(n)  poly(n) • W(n)  poly(n) • k(n) = O(log2n) By trying all seeds of H, it follows that: BPL  DSPACE(log2n) This is NOT surprising since: RL  NL  DSPACE(log2n)

The Pseudorandom Generator 16.6 We will define a (D,W)-pseudorandom generator, H. Assume without loss of generality D W. H extends strings of length O(l2) to strings of length D  exp(l), for l = (log W). The input to H is the tuple: I = (r,<h1> ,<h2>,…, ,<hl’>) Where |r|=l, and <hi> are representations of functions in Hl and l’ = log(D/l). Obviously, |I| = O(l2).

r r h1(r) h2(r) h2(h1(r)) h1(r) r h3(h2(h1(r))) h3(h2(r)) r The Pseudorandom Generator The computation of the PRG can be represented as a complete binary tree of depth l’. The output of H is the concatenation of the binary values of the leaves. Formally: H(r,<hi> ,…, ,<hl’>) = H(r,<hi+1> ,…, ,<hl’>)H(hi(r),<hi+1> ,…, ,<hl’>) starting with H(z) = z

r r h1(r) h2(r) h2(h1(r)) h1(r) r h3(h2(h1(r))) h3(h2(r)) r intuition The sequence is contructed using only L bits for r, and l’ hash functions. resultedpseudorandom sequence Execution graph ( using sequence ) The output of the pseudorandom generator represents a path in The execution graph. 0 L 2L 3L 4L 5L 6L 7L

Analysis 16.7 Claim: H is indeed a (D,W)-pseudorandom generator We will show that a guided walk on GM,x using the guide H(I) behaves almost as a truly random walk. We will perform a sequence of coarsenings. At each coarsening we will use a new hash function from H and reduce the number of random bits by a factor of 2. After l’ such coarsenings, the only random bits remaining are l random bits of r and the l’ representations of the hash functions

Analysis At the first coarsening we replace the random guide R = (R1, R2,…, RD/l) with the semi-random guide R’ = (R1, hl’(R1), R3, hl’(R3),…, RD/l-1, hl’(RD/l-1)). And we will show that this semi-random guide succeeds to “fool” GM,x. r3 r7 r1 r5 h3(r3) h3(r7) r5 h3(r1) h3(r5) r1 r3 r7

P(Acc) - P(Acc) <  Intuition seed h3(r1) h3(r3) r5 h3(r5) r1 pseudorandom r3 random By summing over all l’ levels in the tree we will show that the total difference in probability to accept is less the 1/10

Analysis At the second coarsening we again replace half of the random bits by choosing a new hash function R’ = (R1, hl’(R1), R3, hl’(R3),…, RD/l-1, hl’(RD/l-1)) R’’ = (R1, hl’(R1), hl’-1(R1), hl’-1(hl’(R1)),…) And again we show that this semi-random guide also succeeds to “fool” GM,x. r5 r1 r5 r1 h2(r5) h2(r1) h3(h2(r5)) h2(r5) h3(r5) h3(h2(r1)) h2(r1) h3(r1) r5 r1

Analysis And so on, until we perform l’ such coarsenings. Upon which we have proven the the generator H(I) is indeed a (D,W)-pseudorandom generator. We recall that the pruned execution graph was denoted G.

Analysis At the first coarsening we replace the random guide R = (R1, R2,…, RD/l) with the semi-random guide R’ = (R1, hl’(R1), R3, hl’(R3),…, RD/l-1, hl’(RD/l-1)). We show that: |PrR[ACC(G,R)] - PrR’[ACC(G,R’]| < 

Analysis We perform preprocessing by removing from G all edges (u,v) whose traversal probability is very small, that is, PrR[uv] < 1/W2. Denote by G’ the new graph. Lemma 1: 1 = 2/W, 1.|PrR[ACC(G’,R)] - PrR[ACC(G,R)] | < 1 2.|PrR’[ACC(G’,R’)] - PrR’[ACC(G,R’)] | < 1 Proof: For the first part, the probability that a random walk uses a low probability edge is at most D·(1/W2) < 1/W < 1 . For the second part, we consider two consecutive steps. The first step is truly random and the traversal probability is 1/W2. On the second step we use the hash proposition for the set {0,1}l and the set of low probability edges.

Analysis Proof(continued): For all but a 2-l/5 fraction of hash functions the traversal probability is bounded by, 1/W2 + 2-l/5< 2/W2. On the whole, except for a total of (D/2)·2-l/5 < 1/2 hash functions the overall probability to traverse a low probability edge is bounded by D·(2/W2) < 1/2. Thus the total probability is bounded by 1. Thus removing low probability edges does not significantly affect the outcome of G.

Analysis We will show that on G’ the semi-random guide R’ performs similarly to the true random guide R. Lemma 2: |PrR[ACC(G’,R)] - PrR’[ACC(G’,R’)] | < 2 Proof: Consider first 3 consecutive vertices u, v, w and the set of edges between them Eu,v, Ev,w. The probability that a random walk leaves u and reaches w through v is: Pru-v-w = PrR1,R2{0,1}l[R1  Eu, v and R2  Ev, w] Since we removed low probability edges: Pru-v-w  1/W4

Analysis Proof(continued): The probability that a semi-random walk, determined by hash function h, leaves u and reaches w through v is: Prhu-v-w = PrR{0,1}l[R  Eu, v and h(R)  Ev, w] Using the hash proposition with respect to sets Eu,v, Ev,w we learn that except for a fraction of 2-l/5 h’s: | Prhu-v-w - Pru-v-w |  2-l/5 Applying to all possible triplets we learn that except for a fraction of3 =W3·2-l/5hash functions: u,v,w | Prhu-v-w - Pru-v-w |  2-l/5

Analysis Proof(continued): Denote by Pru-w (Prhu-w) the probability of reaching w from u for the random ( semi-random ) guide. Pru-w = v Pru-v-w and Prhu-w = v Prhu-v-w Consequently if we assume that h is a “good” hash function, |Pru-w - Prhu-w|  W·2-l/5 W4·2-l/5·Pru-w  4·Pru-w For a large enough constant of l = (log W).

Analysis Proof(continued): Since the probability of traversing any path P in G is the sum of the probabilities of traversing every two-hop u-v-w, we learn that: |Pr[R’ = P] - Pr[R = P]|  4·Pr[R = P] Summing over all accepting paths, |PrR[ACC(G’,R)]-PrR’[ACC(G’,R’)] |  4· PrR[ACC(G’,R)]  4 The probability that h is indeed a good hash function is bounded by 3. Therefore, if we define 2 = 3 + 4 we prove the lemma: |PrR[ACC(G’,R)]-PrR’[ACC(G’,R’)] |  2

Analysis Applying both lemmas, we prove that the semi-random guide R’ behaves well in the original graph GM,x: |PrR[ACC(GM,x,R)]-PrR’[ACC(GM,x,R’)] |  . We have proved that the first coarsening succeeds. To proceed, we contract every two adjacent layers of G and create a single edge for every two-hop path taken by R’. Lemma 1 and 2 can be reapplied consecutively until after l’ iterations we are left with a bipartite graph with a truly random guide.

Analysis All in all, we have shown: |PrR{0,1}D[ACC(G,R)]-PrI{0,1}k[ACC(G,H(I)] |  ·l’  1/10 which concludes the proof that H is a (D,W)-pseudorandom generator.

Analysis Problem: h is a hash function dependent on O(logn) bits and M is a log-space machine. Why can’t M differentiate between a truly random guide and a pseudorandom guide by just looking at four consecutive blocks of the pseudo-random sequence z, h(z), z’, h(z’), and fully determining h by solving linear equations in log-space? Solution: During the analysis we required that l = (log W) be large enough. In the underlying computation model this corresponds to the fact that M can not even retain a description of the hash function h.

Extensions and Related Results 16.8 We have shown that BPL  DSPACE(log2n) but the running time of the straightforward derandomized algorithm is (exp(log2n) ). Here we sketch the following result BPL  SC (“Steve’s Class”). Where SC is the class of all languages that can be recognized in poly(n) time and polylog(n) space. Thm:BPL  SC Proof Sketch: If we good guess a “good” set of hash functions h1,h2,…,hl’. Then all that would be left to do is to enumerate on r which take poly(n) time. We will show that we can efficiently find a good set of hash functions.

Extensions and Related Results Proof Sketch(continued): We will incrementally fix the hash functions one at a time from hl’ to h1. The important point to notice is that due to the recursive nature of H, whether h is a good hash function or not depends only on the hash functions fixed before it. Therefore it is enough to incrementally find good hash functions. In order to check whether h is a good hash function we must test if lemma 1 and lemma 2 hold. This requires creating the proper pruned graph G and checking the probabilities on different subsets of edges. Both of these tasks can be performed in poly(n) time.

Extensions and Related Results Proof Sketch(continued): Hence, the total time required is l’·poly(n) = poly(n) time and the total space required, O(log2n), is dominated by storing the functions h1,h2,…,hl’. Further Results (without proof): 1. BPL  DSPACE(log1.5n) 2. Every random computation that can be carried out in polynomial time and in linear space can also be carried out in polynomial time and in linear space, but using only a linear amount of randomness

Slides by Elery Pfeffer and Elad Hazan, Based on slides by Michael Lewin & Robert Sayegh.