Loading in 5 sec....

Fourier Analysis and Boolean Function LearningPowerPoint Presentation

Fourier Analysis and Boolean Function Learning

- 444 Views
- Updated On :

Fourier Analysis and Boolean Function Learning. Jeff Jackson Duquesne University www.mathcs.duq.edu/~jackson. Themes. Fourier analysis is central to learning theoretic results in wide variety of models

Related searches for fourier analysis and boolean function learning

Download Presentation
## PowerPoint Slideshow about 'fourier analysis and boolean function learning' - Antony

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Fourier Analysis and Boolean Function Learning

Jeff Jackson

Duquesne University

www.mathcs.duq.edu/~jackson

Themes

- Fourier analysis is central to learning theoretic results in wide variety of models
- Results generally are the strongest known for learning Boolean function classes with respect to uniform distribution

- Work on learning problems has led to some new harmonic results
- Spectral properties of Boolean function classes
- Algorithms for approximating Boolean functions

Uniform Learning Model

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

Accuracy

ε > 0

Circuit Classes

- Constant-depth AND/OR circuits (AC0 without the polynomial-size restriction; call this CDC)
- DNF: depth-2 circuit with OR at root

Ù

}

Ú

Ú

Ú

d levels

Ù

Ù

Ù

. . .

. . .

. . .

. . .

. . .

v1 v2 v3 vn

Negations allowed

Function Size

- Each function representation has a natural size measure:
- CDC, DNF: # of gates
- DT: # of leaves

- Size sF (f) of f with respect to class F is size of smallest representation of f within F
- For all Boolean f, sCDC(f) ≤ sDNF(f) ≤ sDT(f)

Efficient Uniform Learning Model

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Time

poly(n,sF ,1/ε)

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

Accuracy

ε > 0

Harmonic-Based Uniform Learning

- [LMN]: constant-depth circuits are quasi-efficiently (n polylog(s/ε)-time) uniform learnable
- [BT]: monotone Boolean functions are uniform learnable in time roughly 2√n logn
- Monotone: For all x, i: f(x|xi=0) ≤ f(x|xi=1)
- Also exponential in 1/ε (so assumes ε constant)
- But independent of any size measure

Notation

- Assume f: {0,1}n {-1,1}
- For all a in {0,1}n, χa(x) ≡ (-1) a · x
- For all a in {0,1}n, Fourier coefficient f(a) of f at a is:
- Sometimes write, e.g., f({1}) for f(10…0)

^

^

^

Fourier Properties of Classes

- [LMN]: f is a constant-depth circuit of depth d andS = { a : |a| < logd(s/ε) } ( |a| ≡ # of 1’s in a )
- [BT]:f is a monotone Boolean function andS = { a : |a| < √n / ε) }

Proof Techniques

- [LMN]: Hastad’s Switching Lemma + harmonic analysis
- [BT]: Based on [KKL]
- Define AS(f) ≡ n · Prx,i[f(x|xi=0) ≠ f(x|xi=1)]
- If S = {a : |a| < AS(f)/ε} then ΣaÏS f2(a) < ε
- For monotone f, harmonic analysis + Cauchy-Schwartz shows AS(f) ≤ √n
- Note: This is tight for MAJ

^

Function Approximation

- For all Boolean f,
- For S Í {0,1}n, define
- [LMN]:

“The” Fourier Learning Algorithm

- Given: ε (and perhaps s, d, ...)
- Determine k such that for S = {a : |a| < k}, ΣaÏS f2(a) < ε
- Draw sufficiently large sample of examples <x,f(x)> to closely estimate f(a) for all aÎS
- Chernoff bounds: ~nk/ε sample size sufficient

- Output h ≡ sign(ΣaÎS f(a) χa)
- Run time ~ n2k/ε

^

^

~

Halfspaces

- [KOS]: Halfspaces are efficiently uniform learnable (given ε is constant)
- Halfspace: $wÎRn+1 s.t. f(x) = sign(w · (xº1))
- If S = {a : |a| < (21/ε)2 } then åaÏS f2(a) < ε
- Apply LMN algorithm

- Similar result applies for arbitrary function applied to constant number of halfspaces
- Intersection of halfspaces key learning pblm

^

Halfspace Techniques

- [O] (cf. [BKS], [BJTa]):
- Noise sensitivity of f at γ is probability that corrupting each bit of x with probability γ changes f(x)
- NSγ (f) ≡ ½(1-åa(1-2 γ)|a|f2(a))

- [KOS]:
- If S = {a : |a| < 1/ γ} then åaÏS f2(a) < 3 NSγ (f)
- If f is halfspace then NSγ(f) < 9√ γ

^

^

Monotone DT

- [OS]: Monotone functions are efficiently learnable given:
- ε is constant
- sDT(f) is used as the size measure

- Techniques:
- Harmonic analysis: for monotone f, AS(f) ≤ √log sDT(f)
- [BT]: If S = {a : |a| < AS(f)/ε} then ΣaÏS f2(a) < ε
- Friedgut: $ |T| ≤ 2AS(f)/ε s.t. ΣAËT f2(A) < ε

^

^

Weak Approximators

- KKL also show that if f is monotone,there is an i such that -f({i}) ≥ log2n/n
- Therefore Pr[f(x) = -χ{i}(x)] ≥ ½ + log2n/2n
- In general, h s.t. Pr[f = h] ≥ ½ + 1/poly(n,s) is called a weak approximator to f
- If A outputs a weak approximator for every f in F , then F is weakly learnable

^

Uniform Learning Model

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

Accuracy

ε > 0

Weak Uniform Learning Model

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ½ -1/p(n,s)

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

Efficient Weak Learning Algorithm for Monotone Boolean Functions

- Draw set of ~n2 examples <x,f(x)>
- For i = 1 to n
- Estimatef({i})

- Outputh ≡ argmaxf({i})(-χ{i})

^

^

Weak Approximation for FunctionsMAJ of Constant-Depth Circuits

- Note that adding a single MAJ to a CDC destroys the LMN spectral property
- [JKS]: MAJ of CDC’s is quasi-efficiently quasi-weak uniformlearnable
- If f is a MAJ of CDC’s of depth d, and if the number of gates in f is s, then there is a set A Í {0,1}n such that
- |A| < logd s ≡ k
- Pr[f(x) = χA(x)] ≥ ½ +1/4snk

- If f is a MAJ of CDC’s of depth d, and if the number of gates in f is s, then there is a set A Í {0,1}n such that

Weak Learning Algorithm Functions

- Compute k = logds
- Draw ~snk examples <x,f(x)>
- Repeat for |A| < k
- Estimate f(A)

- Until find A s.t. f(A) > 1/2snk
- Outputh ≡ χA
- Run time ~npolylog(s)

^

^

Weak Approximator FunctionsProof Techniques

- “Discriminator Lemma” (HMPST)
- Implies one of the CDC’s is a weak approximator to f

- LMN spectral characterization of CDC
- Harmonic analysis
- Beigel result used to extend weak learning to CDC with polylog MAJ gates

Boosting Functions

- In many (not all) cases, uniform weak learning algorithms can be converted to uniform (strong) learning algorithms using a boosting technique ([S], [FS], …)
- Need to learn weakly with respect to near-uniform distributions
- For near-uniform distribution D, find weak hj s.t. Prx~D[hj = f] > ½ + 1/poly(n,s)

- Final h typically MAJ of weak approximators

- Need to learn weakly with respect to near-uniform distributions

Strong Learning for Functions MAJ of Constant-Depth Circuits

- [JKS]: MAJ of CDC is quasi-efficiently uniform learnable
- Show that for near-uniform distributions, some parity function is a weak approximator
- Beigel result again extends to CDC with poly-log MAJ gates

- [KP] + boosting: there are distributions for which no parity is a weak approximator

Uniform Learning from a Membership Oracle Functions

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Membership OracleMEM(f)

Learning AlgorithmA

x

f(x)

Accuracy

ε > 0

Uniform Membership Learning of Decision Trees Functions

- [KM]
- L1(f) ≡ åa |f(a)| ≤ sDT(f)
- If S = {a : |f(a)| ≥ ε/L1(f)} then ΣaÏS f2(a) < ε
- [GL]: Algorithm (memberhip oracle) for finding {a : |f(a)| ≥ θ} in time ~n/θ6
- So can efficiently uniform membership learn DT
- Output h same form as LMN:h ≡ sign(ΣaÎS f(a) χa)

^

^

^

^

^

^

~

Uniform Membership Learning of DNF Functions

- [J]
- "(distributions D)$ χa s.t. Prx~D[f(x) = χa(x)] ≥ ½ + 1/6sDNF
- Modified [GL] can efficiently locate such χa given oracle for near-uniform D
- Boosters can provide such an oracle when uniform learning

- Boosting provides strong learning

- [BJTb], [KS], [F]
- For near-uniform D, can find χa in time ~ns2

Uniform Learning from a FunctionsRandom Walk Oracle

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Random

Walk

Examples< x, f(x) >

Random Walk Oracle

RW(f)

Learning AlgorithmA

Accuracy

ε > 0

Random Walk DNF Learning Functions

- [BMOS]
- Noise sensitivity and related values can be accurately estimated using a random walk oracle
- NSγ (f) ≡ ½(1-åa(1-2 γ)|a|f2(a))
- Tb(f) ≡ åa b|a|f2(a)

- Estimate of Tb(f) is efficient if |b| logarithmic
- Only need logarithmic |b| to learn DNF [BF]

- Noise sensitivity and related values can be accurately estimated using a random walk oracle

^

^

Random Walk Parity Learning Functions

- [JW] (unpub)
- Effectively, [BMOS] limited to finding “heavy” Fourier coefficents f(a) for logarithmic |a|
- Using a “breadth-first” variation of KM, can locate any |f(a)| > θ in time O(nlog 1/ θ)
- “Heavy” coefficient corresponds to a parity function that weakly approximates

^

^

Uniform Learning from a FunctionsClassification Noise Oracle

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Classification Noise OracleEXη(f)

Learning AlgorithmA

Uniform random x

Pr[<x, f(x)>]=1-η

Pr[<x, -f(x)>]=η

Accuracy

ε > 0

Error rate

η > 0

Uniform Learning from a FunctionsStatistical Query Oracle

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Statistical Query OracleSQ(f)

Learning AlgorithmA

( q(), τ )

EU[q(x, f(x))] ± τ

Accuracy

ε > 0

SQ and FunctionsClassification Noise Learning

- [K]
- If F is uniform SQ learnable in time poly(n, sF ,1/ε, 1/τ) then F is uniform CN learnable in time poly(n, sF ,1/ε, 1/τ, 1/(1-2η))
- Empirically, almost always true that if F is efficiently uniform learnable then F is efficiently uniform SQ learnable(i.e., 1/τ poly in other parameters)
- Exception: F = PARn ≡ {χa : aÎ {0,1}n, |a| ≤ n}

Uniform SQ Hardness for PAR Functions

- [BFJKMR]
- Harmonic analysis shows that for any q, χa:EU[q(x,χa(x))] = q(0n+1) + q(aº 1)
- Thus adversarial SQ response to (q,τ) is q(0n+1) whenever |q(aº 1)| < τ
- Parseval: |q(bº 1)| < τ for all but 1/τ2 Fourier coefficients
- So ‘bad’ query eliminates only poly coefficients
- Even PARlog n not efficiently SQ learnable

^

^

^

^

^

Uniform Learning from an FunctionsAttribute Noise Oracle

Boolean Function Class F(e.g., DNF)

Hypothesis

h:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Target functionf : {0,1}n {0,1}

Attribute Noise OracleEXDN(f)

Learning AlgorithmA

Uniform random x

<xÅr, f(x)>, r~DN

Accuracy

ε > 0

Noise model

DN

Uniform Learning with FunctionsIndependent Attribute Noise

- [BJTa]:
- LMN algorithm produces estimates of f(a) · Er~DN[χa(r)]

- Example application
- Assume noise process DN is a product distribution:
- DN(x) = ∏i (pixi + (1-pi)(1-xi))

- Assume pi < 1/polylog n, 1/ε at most quasi-poly(n) (mild restrictions)
- Then modified LMN uniform learns attributenoisy AC0 in quasi-poly time

- Assume noise process DN is a product distribution:

^

Agnostic Learning Model Functions

Arbitrary Boolean Function

Hypothesis h

in H s.t. Prx~U [f(x) ≠ h(x) ]

<= optH + ε

Target functionf : {0,1}n {0,1}

Uniform

Random

Examples< x, f(x) >

Example OracleEX(f)

Learning AlgorithmA

Accuracy

ε > 0

Agnostic Learning of Halfspaces Functions

- [KKMS]
- Agnostic learning algorithm for H the set of halfspaces
- Algorithm is not Fourier-based (L1 regression)

- However, a somewhat weaker result can be obtained by simple Fourier analysis

Near-Agnostic Learning via LMN Functions

- [KKMS]:
- Let f be an arbitrary Boolean function
- Fix any set S Í {1..n} and fix ε
- Let g be any function s.t.
- ΣaÏS g2(a) < ε and
- Pr[f ≠ g] (call this η) is minimized for any such g

- Then for h learned by LMN by estimating coefficients of f over S:
- Pr[f ≠ h] < 4η + ε

^

Summary Functions

- Most uniform-learning results for Boolean function classes depend on harmonic analysis
- Learning theory provides motivation for new harmonic observations
- Even very “weak” harmonic results can be useful in learning-theory algorithms

Some Open Problems Functions

- Efficient uniform learning of monotone DNF
- Best to date for small sDNF is [Ser], time ~nslog s (based on [BT], [M], [LMN])

- Non-uniform learning
- Relatively easy to extend many results to product distributions, e.g. [FJS] extends [LMN]
- Key issue in real-world applicability

Open Problems (cont’d) Functions

- Weaker dependence on ε
- Several algorithms fully exponential (or worse) in 1/ε

- Additional proper learning results
- Allows for interpretation of learned hypothesis

References Functions

- Beigel: When Do Extra Majority Gates Help? ...
- [BFJKMR] Blum, Furst, Jackson, Kearns, Mansour, Rudich. Weakly Learning DNF...
- [BJTa] Bshouty, Jackson, Tamon. Uniform-Distribution Attribute Noise Learnability.
- [BJTb] Bshouty, Jackson, Tamon. More Efficient PAC-learning of DNF...
- [BKS] Benjamini, Kalai, Schramm. Noise Sensitivity of Boolean Functions...
- [BMOS] Bshouty, Mossel, O’Donnell, Servedio. Learning DNF from Random Walks.
- [BT] Bshouty, Tamon. On the Fourier Spectrum of Monotone Functions.
- [F] Feldman. Attribute Efficient and Non-adaptive Learning of Parities...
- [FJS] Furst, Jackson, Smith. Improved Learning of AC0 Functions.
- [FS] Freund, Schapire. A Decision-theoretic Generalization of On-line Learning...
- Friedgut: Boolean Functions with Low Average Sensitivity Depend on Few Coordinates.
- [HMPST] Hajnal, Maass, Pudlak, Szegedy, Turan. Threshold Circuits of Bounded Depth.
- [J] Jackson. An Efficient Membership-Query Algorithm for Learning DNF...
- [JKS] Jackson, Klivans, Servedio. Learnability Beyond AC0.
- [JW] Jackson, Wimmer. In prep.
- [KKL] Kahn, Kalai, Linial. The Influence of Variables on Boolean Functions.
- [KKMS] Kalai, Klivans, Mansour, Servedio. On Agnostic Boosting and Parity Learning.
- [K] Kearns. Efficient Noise-tolerant learning from Statistical Queries.
- [KM] Kushilevitz, Mansour. Learning Decision Trees using the Fourier Spectrum.
- [KOS] Klivans, O’Donnell, Servedio. Learning Intersections and Thresholds of Halfspaces.
- [KP] Krause, Pudlak. On Computing Boolean Functions by Sparse Real Polynomials.
- [KS] Klivans, Servedio. Boosting and Hard-core Sets.
- [LMN] Linial, Mansour, Nisan. Constant-depth Circuits, Fourier Transform, and Learnability.
- [M] Mansour. An O(nloglog n) Learning Algorithm for DNF...
- [O] O’Donnell. Hardness Amplification within NP.
- [OS] O’Donnell, Servedio. Learning Monotone Functions from Random Examples in Polynomial Time.
- [S] Schapire. The Strength of Weak Learnability.
- [Ser] Servedio. On Learning Monotone DNF under Product Distributions.

Download Presentation

Connecting to Server..