Noise-Insensitive Boolean-Functions are Juntas

Noise-Insensitive Boolean-Functions are Juntas Guy Kindler & Muli SafraSlides prepared with help of: Adi Akavia

Influential People • The theory of the influence of variables on Boolean functions[BL, KKL] and related issues, has been introduced to tackle social choice problems, furthermore has motivated a magnificent sequence of works, related to economics [K], percolation [BKS], Hardness of approximation [DS]Revolving around the Fourier/Walsh analysis of Boolean functions… • And the real important question:

Where to go for Dinner? Who has suggestions: Each cast their vote in an (electronic) envelope, and have the system decided, not necessarily according to majority… It turns out someone –in the Florida wing- has the power to flip some votes Power influence

Voting Systems • n agents, each voting either “for” (T) or “against” (F) – a Boolean function over n variables f is the outcome • The values of the agents (variables) may each, independently, flip with probability  • It turns out: one cannot design an f that would be robust to such noise -that is, would, on average, change value w.p. < O(1)- unless taking into account only very few of the votes

Dictatorship Def: a Boolean function P([n]){-1,1} is a monotone e-dictatorships --denoted fe--if:

Juntas Def: a Boolean function f:P([n]){-1,1} is a j-Junta if J[n] where|J|≤ j, s.t. for every x[n]: f(x) = f(x  J) Def: f is an [, j]-Junta if  j-Junta f’ s.t. Def: f is an [, j, p]-Junta if  j-Junta f’ s.t. We would tend to omit p p-biased, product distribution

Long-Code • In the long-code L:[n] {0,1}2neach element is encoded by an 2n-bits • This is the most extensive binary code, having one bit for every subset in P([n])

Long-Code • Encoding an element e[n]: • Eelegally-encodes an element e if Ee = fe T F F T T

Long-Code  Monotone-Dictatorship • The truth-table of a Boolean function over n elements, can be considered as a 2n bits long string (each corresponding to one input setting – or a subset of [n])For a long-code, the legal code-words are all monotone dictatorshipsHow about the Hadamard code?

Long-code Tests • Def (a long-code test): given a code-word w, probe it in a constant number of entries, and • accept w.h.p if w is a monotone dictatorship • reject w.h.p if w is not close to any monotone dictatorship

Efficient Long-code Tests For some applications, it suffices if the test may accept illegal code-words, nevertheless, ones which have short list-decoding: Def(a long-code list-test): given a code-word w, probe it in 2/3 places, and • accept w.h.p if w is a monotone dictatorship, • reject w.h.p if wis not even approximately determined by a short list of domain elements, that is, if a JuntaJ[n] s.t. f is close to f’ and f’(x)=f’(xJ) for all x Note: a long-code list-test, distinguishes between the case w is a dictatorship, to the case w is far from a junta.

Background • Thm (Friedgut): a Boolean function f with small average-sensitivity is an [,j]-junta • Thm (Bourgain): a Boolean function f with small high-frequency weight is an [,j]-junta • Thm (Kindler&Safra): a Boolean function f with small high-frequency weight in a p-biased measure is an [,j]-junta • Corollary: a Boolean function f with smallnoise-sensitivity is an [,j]-junta • Parameters: average-sensitivity [BL,KKL,F] high-frequency weight [KKL,B] noise-sensitivity [BKS]

[n] [n] I I z x Noise-Sensitivity How often does the value of f changes when the input is perturbed? [n] [n] I I z x

[n] [n] I I z x Noise-Sensitivity • Def(,p,x[n] ): Let 0<<1, and xP([n]). Then y~,p,x, if y = (x\I) z where • I~[n] is a noise subset, and • z~ pI is a replacement. Def(-noise-sensitivity): let 0<<1, then [ When p=½ equivalent to flipping each coordinate in x w.p. /2.]

Fourier/Walsh Transform Write f:{-1, 1}n{-1, 1} as a polynomial What would be the monomials? • For every set S[n] we have a monomial which is the product of all variables in S (the only relevant powers are either 0 or 1)????? Make sense now to consider the degree of f or to break it according to the various degrees of the monomials..

High/Low Frequencies and their Weights Def: the high-frequency portion of f: Def: the low-frequency portion of f: Def: the high-frequency-weight is: Def: the low-frequency-weight is:

Low High-Frequency Weight Prop: the -noise-sensitivity can be expressed in Fourier transform terms as Prop: Low ns Low high-freq weight Proof: By the above proposition, low noise-sensitivity impliesnevertheless, f being {-1, 1} function, by Parseval formula (that the norm 2 of the function and its Fourier transform are equal) implies

Average and Restriction [n] Def: Let I[n],xP([n]\I), the restriction function is Def: the average function is Note: I y x [n] I y y y y y x

Fourier Expansion • Prop: • Prop????: • Corollary:

Variation Def: the variation of f: Prop: the following are equivalent definitions to the variation of f:

Low-freq Variation and Low-freq Average-Sensitivity Def: the low-frequency variation is: Def: the average sensitivity is And in Fourier representation: Def: the low-frequency average sensitivity is:

Main Result Theorem:  constant >0 s.t. any Boolean function f:P([n]){-1,1} satisfying is an [,j]-junta for j=O(-2k32k). Corollary: fix a p-biased distribution p overP([n]). Let >0 be any parameter. Set k=log1-(1/2). Then  constant >0 s.t. any Boolean function f:P([n]){-1,1} satisfying is an [,j]-junta for j=O(-2k32k).

Of course they’ll have to discuss it over dinner…. Where to go for Dinner? Who has suggestions: Each cast their vote in an (electronic) envelope, and have the system decided, not necessarily according to majority… It turns out someone –in the Florida wing- has the power to flip some votes Form a Committee Power influence

First Attempt: Following Freidgut’s Proof Thm: any Boolean function f is an [,j]-junta for Proof: • Specify the juntawhere, let k=O(as(f)/) and fix =2-O(k) • Show the complement of J has small variation P([n]) J

P([n]) J Following Freidgut - Cont Lemma: Proof: Now, lets bound each argument: Prop: Proof: characters of sizek contribute to the average-sensitivity at least (since )

we do not know whether as(f) is small!  True only since this is a {-1,0,1} function. So we cannot proceed this way with only ask! Following Freidgut - Cont Prop: Proof:

If k were 1 Easy case (!?!): If we’d have a bound on the non-linear weight, we should be done. The linear part is a set of independent characters (the singletons) In order for those to hit close to 1 or -1 most of the time, they must avoid the law of large numbers, namely be almost entirely placed on one singleton [by Chernoff like bound]Thm[FKN, ext.]: Assume f is close to linear, then f is close to shallow ( a constant function or a dictatorship)

How to Deal with Dependency between Characters Recall (theorem’s premise) Idea: Let • Partition [n]\J into I1,…,Ir, for r >> k • w.h.p fI[x] is close to linear (low freq characters intersect I expectedly by 1 element, while high-frequency weight is low). P([n]) I2 Ir I I1 J

P([n]) I2 Ir I I1 J So what? fI[x] is close to linear By FKNfI[x]is either a constant-function or a dictatorship, for any x Still, fI[x] could be a different dictatorship for every x, hence the variation of each iI might be low

almost linear  almost shallow Theorem([FKN]): global constant M, s.t. Boolean function f, shallow Boolean function g, s.t. • Hence, ||fI[x]>1||2 is small fI[x] is close to shallow!

Dictatorship and its Singleton • Prop: if fI[x] is a dictatorship, then coordinate i s.t. (where p is the bias). • Corollary (from [FKN]): global constant M, s.t. Boolean function h, eitheror weight Total weight of no more than 1-p Characters {1} {2} {i} {n} {1,2} {1,3} {n-1,n} S {1,..,n}

fI[x] Mostly Constant • Lemma: >0, s.t. for any  and any function g:P([m])  • Def: Let DI be the set of xP(I), s.t. fI[x] is a dictatorship • Next we show, that |DI| must be small, hence for most x, fI[x] is constant.

Parseval Prev lemma |DI| must be small • Lemma: • Proof: let , then Each S is counted only for one index iI. (Otherwise, if S was counted for both i and j in I, then |SI|>1!)

ai no more than 1 1 1 2 3 max n ai 1 1/amax 1 2 3 n Simple Prop • Prop: let {ai}iI be sub-distribution, that is, iIai1, 0ai, then iIai2maxiI{ai}. • Proof:

|DI| must be small - Cont • Therefore(since ), • Hence

Recall • However {S}S are orthonormal, and Obtaining the Lemma • It remains to show that indeed: • Prop1: • Prop2:

Obtaining the Lemma – Cont. • Prop3: • Proof: separate by freq: • Small freq: • Large freq: • Corollary(from props 2,3):

Obtaining the Lemma – Cont. • Recall: by corollary from [FKN], Either or • Hence • By Corollary • Combined with Prop1 we obtain: |DI| is small

Important Lemma • Lemma: >0, s.t. for any  and any function g:P([m]) , the following holds: high-freq Low-freq

Beckner/Nelson/Bonami Inequality Def: let Tbe the following operator on f Thm: for any p≥rand≤((r-1)/(p-1))½ Corollary: for f s.t. f>k=0

Probability Concentration • Simple Bound: • Proof: • Low-freq Bound: Let g:P([m])  be of degree k and >0, then >0 s.t. • Proof: recall the corollary: 

Lemma’s Proof • Now, let’s prove the lemma: • Bounding low and high freq separately:, simple bound Low-freq bound

Shallow Function • Def: a function f is linear, if only singletons have non-zero weight • Def: a function f is shallow, if f is either a constant or a dictatorship. • Claim: Boolean linear functions are shallow. weight Charactersize 0 1 2 3 k n

Boolean Linear  Shallow • Claim: Boolean linear functions are shallow. • Proof: let f be Boolean linear function, we next show: • {io} s.t. (i.e. ) • And conclude, that either or i.e.f is shallow

1 -1 Claim 1 • Claim 1: let f be boolean linear function, then {io} s.t. • Proof: w.l.o.g assume • for any z{3,…,n}, considerx00=z, x10=z{1}, x01=z{2}, x11=z{1,2} • then . • Next value must be far from {-1,1}, • A contradiction! (boolean function) • Therefore ?

1 0 -1 Claim 2 • Claim 2: let f be boolean function, s.t.Then either or • Proof: consider f() and f(i0): • Then • but f is boolean, hence • therefore

Proving FKN: almost-linear  close to shallow • Theorem: Let f:P([n])  be linear, • Let • let i0 be the index s.t. is maximal then • Note: f is linear, hence w.l.o.g., assume i0=1, then all we need to show is:We show that in the following claim and lemma.

Corollary • Corollary: Let f be linear, andthen  a shallow booleanfunction g s.t. • Proof: let , let g be the boolean function closest to l. Then,this is true, as • is small (by theorem), • and additionally is small, since

weight Each of weight no more than c Characters {} {1} {2} {i} {n} {1,2} {1,3} {n-1,n} S {1,..,n} Claim 1 • Claim 1: Let f be linear. w.l.o.g., assumethen global constant c=min{p,1-p}s.t.

1 -1 Proof of Claim1 • Proof: assume • for any z{3,…,n}, considerx00=z, x10=z{1}, x01=z{2}, x11=z{1,2} • then • Next value must be far from {-1,1} ! • A contradiction! (to ) ?

Noise-Insensitive Boolean-Functions are Juntas