1 / 48

# fourier analysis and boolean function learning - PowerPoint PPT Presentation

Fourier Analysis and Boolean Function Learning. Jeff Jackson Duquesne University www.mathcs.duq.edu/~jackson. Themes. Fourier analysis is central to learning theoretic results in wide variety of models

Related searches for fourier analysis and boolean function learning

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'fourier analysis and boolean function learning' - Antony

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Fourier Analysis and Boolean Function Learning

Jeff Jackson

Duquesne University

www.mathcs.duq.edu/~jackson

• Fourier analysis is central to learning theoretic results in wide variety of models

• Results generally are the strongest known for learning Boolean function classes with respect to uniform distribution

• Work on learning problems has led to some new harmonic results

• Spectral properties of Boolean function classes

• Algorithms for approximating Boolean functions

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

Accuracy

ε > 0

• Constant-depth AND/OR circuits (AC0 without the polynomial-size restriction; call this CDC)

• DNF: depth-2 circuit with OR at root

Ù

}

Ú

Ú

Ú

d levels

Ù

Ù

Ù

. . .

. . .

. . .

. . .

. . .

v1 v2 v3 vn

Negations allowed

v3

v2

v1

0

1

v4

0

1

0

v3

x3 = 0

v2

v1

0

1

v4

0

1

0

x = 11001

v3

v2

v1

x1 = 1

0

1

v4

0

1

0

x = 11001

v3

v2

v1

0

1

v4

0

1

0

x = 11001

f(x) = 1

• Each function representation has a natural size measure:

• CDC, DNF: # of gates

• DT: # of leaves

• Size sF (f) of f with respect to class F is size of smallest representation of f within F

• For all Boolean f, sCDC(f) ≤ sDNF(f) ≤ sDT(f)

Efficient Uniform Learning Model

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Time

poly(n,sF ,1/ε)

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

Accuracy

ε > 0

• [LMN]: constant-depth circuits are quasi-efficiently (n polylog(s/ε)-time) uniform learnable

• [BT]: monotone Boolean functions are uniform learnable in time roughly 2√n logn

• Monotone: For all x, i: f(x|xi=0) ≤ f(x|xi=1)

• Also exponential in 1/ε (so assumes ε constant)

• But independent of any size measure

• Assume f: {0,1}n  {-1,1}

• For all a in {0,1}n, χa(x) ≡ (-1) a · x

• For all a in {0,1}n, Fourier coefficient f(a) of f at a is:

• Sometimes write, e.g., f({1}) for f(10…0)

^

^

^

• [LMN]: f is a constant-depth circuit of depth d andS = { a : |a| < logd(s/ε) } ( |a| ≡ # of 1’s in a )

• [BT]:f is a monotone Boolean function andS = { a : |a| < √n / ε) }

• [LMN]: Hastad’s Switching Lemma + harmonic analysis

• [BT]: Based on [KKL]

• Define AS(f) ≡ n · Prx,i[f(x|xi=0) ≠ f(x|xi=1)]

• If S = {a : |a| < AS(f)/ε} then ΣaÏS f2(a) < ε

• For monotone f, harmonic analysis + Cauchy-Schwartz shows AS(f) ≤ √n

• Note: This is tight for MAJ

^

• For all Boolean f,

• For S Í {0,1}n, define

• [LMN]:

• Given: ε (and perhaps s, d, ...)

• Determine k such that for S = {a : |a| < k}, ΣaÏS f2(a) < ε

• Draw sufficiently large sample of examples <x,f(x)> to closely estimate f(a) for all aÎS

• Chernoff bounds: ~nk/ε sample size sufficient

• Output h ≡ sign(ΣaÎS f(a) χa)

• Run time ~ n2k/ε

^

^

~

• [KOS]: Halfspaces are efficiently uniform learnable (given ε is constant)

• Halfspace: \$wÎRn+1 s.t. f(x) = sign(w · (xº1))

• If S = {a : |a| < (21/ε)2 } then åaÏS f2(a) < ε

• Apply LMN algorithm

• Similar result applies for arbitrary function applied to constant number of halfspaces

• Intersection of halfspaces key learning pblm

^

• [O] (cf. [BKS], [BJTa]):

• Noise sensitivity of f at γ is probability that corrupting each bit of x with probability γ changes f(x)

• NSγ (f) ≡ ½(1-åa(1-2 γ)|a|f2(a))

• [KOS]:

• If S = {a : |a| < 1/ γ} then åaÏS f2(a) < 3 NSγ (f)

• If f is halfspace then NSγ(f) < 9√ γ

^

^

• [OS]: Monotone functions are efficiently learnable given:

• ε is constant

• sDT(f) is used as the size measure

• Techniques:

• Harmonic analysis: for monotone f, AS(f) ≤ √log sDT(f)

• [BT]: If S = {a : |a| < AS(f)/ε} then ΣaÏS f2(a) < ε

• Friedgut: \$ |T| ≤ 2AS(f)/ε s.t. ΣAËT f2(A) < ε

^

^

• KKL also show that if f is monotone,there is an i such that -f({i}) ≥ log2n/n

• Therefore Pr[f(x) = -χ{i}(x)] ≥ ½ + log2n/2n

• In general, h s.t. Pr[f = h] ≥ ½ + 1/poly(n,s) is called a weak approximator to f

• If A outputs a weak approximator for every f in F , then F is weakly learnable

^

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

Accuracy

ε > 0

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ½ -1/p(n,s)

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

• Draw set of ~n2 examples <x,f(x)>

• For i = 1 to n

• Estimatef({i})

• Outputh ≡ argmaxf({i})(-χ{i})

^

^

Weak Approximation for FunctionsMAJ of Constant-Depth Circuits

• Note that adding a single MAJ to a CDC destroys the LMN spectral property

• [JKS]: MAJ of CDC’s is quasi-efficiently quasi-weak uniformlearnable

• If f is a MAJ of CDC’s of depth d, and if the number of gates in f is s, then there is a set A Í {0,1}n such that

• |A| < logd s ≡ k

• Pr[f(x) = χA(x)] ≥ ½ +1/4snk

Weak Learning Algorithm Functions

• Compute k = logds

• Draw ~snk examples <x,f(x)>

• Repeat for |A| < k

• Estimate f(A)

• Until find A s.t. f(A) > 1/2snk

• Outputh ≡ χA

• Run time ~npolylog(s)

^

^

Weak Approximator FunctionsProof Techniques

• “Discriminator Lemma” (HMPST)

• Implies one of the CDC’s is a weak approximator to f

• LMN spectral characterization of CDC

• Harmonic analysis

• Beigel result used to extend weak learning to CDC with polylog MAJ gates

Boosting Functions

• In many (not all) cases, uniform weak learning algorithms can be converted to uniform (strong) learning algorithms using a boosting technique ([S], [FS], …)

• Need to learn weakly with respect to near-uniform distributions

• For near-uniform distribution D, find weak hj s.t. Prx~D[hj = f] > ½ + 1/poly(n,s)

• Final h typically MAJ of weak approximators

Strong Learning for Functions MAJ of Constant-Depth Circuits

• [JKS]: MAJ of CDC is quasi-efficiently uniform learnable

• Show that for near-uniform distributions, some parity function is a weak approximator

• Beigel result again extends to CDC with poly-log MAJ gates

• [KP] + boosting: there are distributions for which no parity is a weak approximator

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Membership OracleMEM(f)

Learning AlgorithmA

x

f(x)

Accuracy

ε > 0

• [KM]

• L1(f) ≡ åa |f(a)| ≤ sDT(f)

• If S = {a : |f(a)| ≥ ε/L1(f)} then ΣaÏS f2(a) < ε

• [GL]: Algorithm (memberhip oracle) for finding {a : |f(a)| ≥ θ} in time ~n/θ6

• So can efficiently uniform membership learn DT

• Output h same form as LMN:h ≡ sign(ΣaÎS f(a) χa)

^

^

^

^

^

^

~

Uniform Membership Learning of DNF Functions

• [J]

• "(distributions D)\$ χa s.t. Prx~D[f(x) = χa(x)] ≥ ½ + 1/6sDNF

• Modified [GL] can efficiently locate such χa given oracle for near-uniform D

• Boosters can provide such an oracle when uniform learning

• Boosting provides strong learning

• [BJTb], [KS], [F]

• For near-uniform D, can find χa in time ~ns2

Uniform Learning from a FunctionsRandom Walk Oracle

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Random

Walk

Examples< x, f(x) >

Random Walk Oracle

RW(f)

Learning AlgorithmA

Accuracy

ε > 0

Random Walk DNF Learning Functions

• [BMOS]

• Noise sensitivity and related values can be accurately estimated using a random walk oracle

• NSγ (f) ≡ ½(1-åa(1-2 γ)|a|f2(a))

• Tb(f) ≡ åa b|a|f2(a)

• Estimate of Tb(f) is efficient if |b| logarithmic

• Only need logarithmic |b| to learn DNF [BF]

^

^

Random Walk Parity Learning Functions

• [JW] (unpub)

• Effectively, [BMOS] limited to finding “heavy” Fourier coefficents f(a) for logarithmic |a|

• Using a “breadth-first” variation of KM, can locate any |f(a)| > θ in time O(nlog 1/ θ)

• “Heavy” coefficient corresponds to a parity function that weakly approximates

^

^

Uniform Learning from a FunctionsClassification Noise Oracle

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Classification Noise OracleEXη(f)

Learning AlgorithmA

Uniform random x

Pr[<x, f(x)>]=1-η

Pr[<x, -f(x)>]=η

Accuracy

ε > 0

Error rate

η > 0

Uniform Learning from a FunctionsStatistical Query Oracle

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Statistical Query OracleSQ(f)

Learning AlgorithmA

( q(), τ )

EU[q(x, f(x))] ± τ

Accuracy

ε > 0

SQ and FunctionsClassification Noise Learning

• [K]

• If F is uniform SQ learnable in time poly(n, sF ,1/ε, 1/τ) then F is uniform CN learnable in time poly(n, sF ,1/ε, 1/τ, 1/(1-2η))

• Empirically, almost always true that if F is efficiently uniform learnable then F is efficiently uniform SQ learnable(i.e., 1/τ poly in other parameters)

• Exception: F = PARn ≡ {χa : aÎ {0,1}n, |a| ≤ n}

Uniform SQ Hardness for PAR Functions

• [BFJKMR]

• Harmonic analysis shows that for any q, χa:EU[q(x,χa(x))] = q(0n+1) + q(aº 1)

• Thus adversarial SQ response to (q,τ) is q(0n+1) whenever |q(aº 1)| < τ

• Parseval: |q(bº 1)| < τ for all but 1/τ2 Fourier coefficients

• So ‘bad’ query eliminates only poly coefficients

• Even PARlog n not efficiently SQ learnable

^

^

^

^

^

Uniform Learning from an FunctionsAttribute Noise Oracle

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Attribute Noise OracleEXDN(f)

Learning AlgorithmA

Uniform random x

<xÅr, f(x)>, r~DN

Accuracy

ε > 0

Noise model

DN

Uniform Learning with FunctionsIndependent Attribute Noise

• [BJTa]:

• LMN algorithm produces estimates of f(a) · Er~DN[χa(r)]

• Example application

• Assume noise process DN is a product distribution:

• DN(x) = ∏i (pixi + (1-pi)(1-xi))

• Assume pi < 1/polylog n, 1/ε at most quasi-poly(n) (mild restrictions)

• Then modified LMN uniform learns attributenoisy AC0 in quasi-poly time

^

Agnostic Learning Model Functions

Arbitrary Boolean Function

Hypothesis h

in H s.t. Prx~U [f(x) ≠ h(x) ]

<= optH + ε

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

Accuracy

ε > 0

Agnostic Learning of Halfspaces Functions

• [KKMS]

• Agnostic learning algorithm for H the set of halfspaces

• Algorithm is not Fourier-based (L1 regression)

• However, a somewhat weaker result can be obtained by simple Fourier analysis

Near-Agnostic Learning via LMN Functions

• [KKMS]:

• Let f be an arbitrary Boolean function

• Fix any set S Í {1..n} and fix ε

• Let g be any function s.t.

• ΣaÏS g2(a) < ε and

• Pr[f ≠ g] (call this η) is minimized for any such g

• Then for h learned by LMN by estimating coefficients of f over S:

• Pr[f ≠ h] < 4η + ε

^

Summary Functions

• Most uniform-learning results for Boolean function classes depend on harmonic analysis

• Learning theory provides motivation for new harmonic observations

• Even very “weak” harmonic results can be useful in learning-theory algorithms

Some Open Problems Functions

• Efficient uniform learning of monotone DNF

• Best to date for small sDNF is [Ser], time ~nslog s (based on [BT], [M], [LMN])

• Non-uniform learning

• Relatively easy to extend many results to product distributions, e.g. [FJS] extends [LMN]

• Key issue in real-world applicability

Open Problems (cont’d) Functions

• Weaker dependence on ε

• Several algorithms fully exponential (or worse) in 1/ε

• Allows for interpretation of learned hypothesis

References Functions

• Beigel: When Do Extra Majority Gates Help? ...

• [BFJKMR] Blum, Furst, Jackson, Kearns, Mansour, Rudich. Weakly Learning DNF...

• [BJTa] Bshouty, Jackson, Tamon. Uniform-Distribution Attribute Noise Learnability.

• [BJTb] Bshouty, Jackson, Tamon. More Efficient PAC-learning of DNF...

• [BKS] Benjamini, Kalai, Schramm. Noise Sensitivity of Boolean Functions...

• [BMOS] Bshouty, Mossel, O’Donnell, Servedio. Learning DNF from Random Walks.

• [BT] Bshouty, Tamon. On the Fourier Spectrum of Monotone Functions.

• [F] Feldman. Attribute Efficient and Non-adaptive Learning of Parities...

• [FJS] Furst, Jackson, Smith. Improved Learning of AC0 Functions.

• [FS] Freund, Schapire. A Decision-theoretic Generalization of On-line Learning...

• Friedgut: Boolean Functions with Low Average Sensitivity Depend on Few Coordinates.

• [HMPST] Hajnal, Maass, Pudlak, Szegedy, Turan. Threshold Circuits of Bounded Depth.

• [J] Jackson. An Efficient Membership-Query Algorithm for Learning DNF...

• [JKS] Jackson, Klivans, Servedio. Learnability Beyond AC0.

• [JW] Jackson, Wimmer. In prep.

• [KKL] Kahn, Kalai, Linial. The Influence of Variables on Boolean Functions.

• [KKMS] Kalai, Klivans, Mansour, Servedio. On Agnostic Boosting and Parity Learning.

• [K] Kearns. Efficient Noise-tolerant learning from Statistical Queries.

• [KM] Kushilevitz, Mansour. Learning Decision Trees using the Fourier Spectrum.

• [KOS] Klivans, O’Donnell, Servedio. Learning Intersections and Thresholds of Halfspaces.

• [KP] Krause, Pudlak. On Computing Boolean Functions by Sparse Real Polynomials.

• [KS] Klivans, Servedio. Boosting and Hard-core Sets.

• [LMN] Linial, Mansour, Nisan. Constant-depth Circuits, Fourier Transform, and Learnability.

• [M] Mansour. An O(nloglog n) Learning Algorithm for DNF...

• [O] O’Donnell. Hardness Amplification within NP.

• [OS] O’Donnell, Servedio. Learning Monotone Functions from Random Examples in Polynomial Time.

• [S] Schapire. The Strength of Weak Learnability.

• [Ser] Servedio. On Learning Monotone DNF under Product Distributions.