1 / 49

# Correlation Immune Functions and Learning - PowerPoint PPT Presentation

Correlation Immune Functions and Learning. Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric Bach and David Page (U. of Wisconsin), and Soumya Ray (Case Western). Identifying relevant variables from random examples.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Correlation Immune Functions and Learning' - mariko-burke

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Correlation Immune Functions and Learning

Lisa Hellerstein

Polytechnic Institute of NYU

Brooklyn, NY

Includes joint work with Bernard Rosell (AT&T), Eric Bach and David Page (U. of Wisconsin), and Soumya Ray (Case Western)

x f(x)

(1,1,0,0,0,1,1,0,1,0) 1

(0,1,0,0,1,0,1,1,0,1) 1

(1,0,0,1,0,1,0,0,1,0) 0

• Assume random examples drawn from uniform distribution over {0,1}n

• Look for dependence between input variables and output

If xi irrelevant P(f=1|xi=1) = P(f=1|xi=0)

If xi relevant P(f=1|xi=1) ≠ P(f=1|xi=0)

for previous function f

xi relevant P(f=1|xi=1) = 1/2 = P(f=1|xi=0)

xi irrelevant P(f=1|xi=1) = 1/2 = P(f=1|xi=0)

Finding a relevant variable easy for some functions.

Not so easy for others.

• Suppose you know r (# of relevant vars)

Assume r << n

(Think of r = log n)

• Get m random examples, where

m = poly(2r ,log n,1/δ)

• With probability > 1-δ, have enough info to determine which r variables are relevant

• All other sets of r variables can be ruled out

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10f

(1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1

(0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0

(1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1

(0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0

(1, 1, 1, 0, 0, 0, 1, 1, 1, 1) 0

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10f

(1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1

(0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0

(1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1

(0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0

(1, 1, 1, 0, 0, 0, 1, 1, 0, 1) 0

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10f

(1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1

(0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0

(1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1

(0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0

(1, 1, 1, 0, 0, 0, 1, 1, 0, 1) 0

x3, x5, x9 can’t be the relevant variables

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10f

(1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1

(0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0

(1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1

(0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0

(1, 1, 1, 0, 0, 0, 1, 1, 1, 1) 0

x1, x3, x10 ok

If output of f is dependent on xi, can detect dependence (whp) in time poly(n, 2r) and identify xi as relevant.

Problematic Functions

Every variable is independent of output of f

P[f=1|xi=0] = P[f=1|xi=1] for all xi

Equivalently, all degree 1 Fourier coeffs = 0

Functions with this property said to be

CORRELATION-IMMUNE

P[f=1|x (whp) in time poly(n, 2r) and identify xi as relevant.i=0] = P[f=1|xi=1] for all xi

Geometrically:

11

10

e.g. n=2

01

00

P[f=1|x (whp) in time poly(n, 2r) and identify xi as relevant.i=0] = P[f=1|xi=1] for all xi

Geometrically:

0

1

11

10

Parity(x1,x2)

01

00

0

1

P[f=1|x (whp) in time poly(n, 2r) and identify xi as relevant.i=0] = P[f=1|xi=1] for all xi

Geometrically:

0

1

11

10

X1=1

X1=0

01

00

0

1

P[f=1|x (whp) in time poly(n, 2r) and identify xi as relevant.i=0] = P[f=1|xi=1] for all xi

X2=0

X2=1

0

1

11

10

01

00

0

1

Correlation-immune functions and decision tree learners (whp) in time poly(n, 2r) and identify xi as relevant.

• Decision tree learners in ML

• Popular machine learning approach (CART, C4.5)

• Given set of examples of Boolean function, build a decision tree

• Heuristics for decision tree learning

• Greedy, top-down

• Differ in way choose which variable to put in node

• Pick variable having highest “gain”

• P[f=1|xi=1] = P[f=1|xi=0] means 0 gain

• Correlation-immune functions problematic for decision tree learners

• Lookahead (whp) in time poly(n, 2r) and identify xi as relevant.

• Skewing: An efficient alternative to lookahead for decision tree induction. IJCAI 2003 [Page, Ray]

• Why skewing works: learning difficult Boolean functions with greedy tree learners. ICML 2005 [Rosell, Hellerstein, Ray, Page]

Story (whp) in time poly(n, 2r) and identify xi as relevant.

Part One

n (whp) in time poly(n, 2r) and identify xi as relevant.

• How many difficult functions?

• More than

# fns

n-1

2

2

n (whp) in time poly(n, 2r) and identify xi as relevant.

• How many different hard functions?

• More than

SOMEONE MUST HAVE STUDIED THESE FUNCTIONS BEFORE…

# fns

n/2

2

2

Story (whp) in time poly(n, 2r) and identify xi as relevant.

Part Two

Roy, B. K. 2002. A Brief Outline of Research on Correlation Immune Functions. In Proceedings of the 7th Australian Conference on information Security and Privacy (July 03 - 05, 2002). L. M. Batten and J. Seberry, Eds. Lecture Notes In Computer Science, vol. 2384. Springer-Verlag, London, 379-394.

Correlation-immune functions Immune Functions. In

• k-correlation immune function

• For every subset S of the input variables s.t.

1 ≤ |S| ≤ k

P[f | S] = P[f]

• [Xiao, Massey 1988] Equivalently, all Fourier coefficients of degree i are 0, for

1 ≤ i ≤ k

Siegenthaler’s Theorem Immune Functions. In

If f is k-correlation immune, then the GF[2] polynomial for f has degree at most n-k.

Siegenthaler’s Theorem [1984] Immune Functions. In

If f is k-correlation immune, then the GF[2] polynomial for f has degree at most n-k.

Algorithm of Mossel, O’Donnell, Servedio [STOC 2003] based on this theorem

End of Story Immune Functions. In

Non-uniform distributions Immune Functions. In

• Correlation-immune functions are defined wrt the uniform distribution

• What if distribution is biased?

e.g. each bit 1 with probability ¾

f(x Immune Functions. In 1,x2) = parity(x1,x2)each bit 1 with probability 3/4

P[f=1|x1=1] ≠ P[f=1|x1=0]

f(x Immune Functions. In 1,x2) = parity(x1,x2)p=1 with probability 1/4

P[f=1|x1=1] ≠ P[f=1|x1=0]

For added irrelevant variables, would be equal

Correlation-immunity wrt p-biased distributions Immune Functions. In

Definitions

• f is correlation-immune wrt distributionD if

PD[f=1|xi=1] = PD[f=1|xi=0]

for all xi

• p-biased distribution Dp: each bit set to 1 independently with probability p

• For all p-biased distributions D,

PD[f=1|xi=1] = PD[f=1|xi=0]

for all irrelevant xi

Lemma: Let f(x1,…,xn) be a Boolean function with r relevant variables. Then f is correlation immune w.r.t. Dp for at most r-1 values of p.

Pf: Correlation immune wrt Dp means

P[f=1|xi=1] – P[f=1|xi=0] = 0 (*)

for all xi.

Consider fixed f and xi. Can write lhs of (*)

as polynomial h(p).

• e.g. f(x relevant variables. Then f is correlation immune w.r.t. D1,x2, x3) = parity(x1,x2, x3)p-biased distribution Dp

h(p) = PDp[f=1|x1=1] - PDp[f=1|x1=0] =

( p2 + p(1-p) ) – ( p(1-p) + (1-p)p )

If add irrelevant variable, this polynomial doesn’t change

• h(p) for arbitrary f, variable xi, has degree <= r-1, where r is number of variables.

• f correlation-immune wrt at most r-1 values of p, unless h(p) identically 0 for all xi.

h(p) = P relevant variables. Then f is correlation immune w.r.t. DDp[f=1|xi=1] -PDp[f=1|xi=0]

where wd is number of inputs x for which f(x)=1, xi=1, and x contains exactly d additional 1’s.

i.e. wd = number of positive assignments of fxi<-1 of Hamming weight d

• Similar expression for PDp[f=1|xi=0]

P relevant variables. Then f is correlation immune w.r.t. DDp[f=1|xi=1] - PDp[f=1|xi=0]

=

where wd = number of positive assignments of fxi<-1 of Hamming weight d

rd = number of positive assignments of fxi<-0 of Hamming weight d

Not identically 0 iff wd≠ rd for some d

Property of Boolean functions relevant variables. Then f is correlation immune w.r.t. D

Lemma: If f has at least one relevant variable, then for some relevant variable xi, and some d,

wd≠ rd for some d

where

wd = number of positive assignments of fxi<-1 of Hamming weight d

rd = number of positive assignments of fxi<-0 of Hamming weight d

How much does it help to have access to examples from different distributions?

Hellerstein, Rosell, Bach, Page, Ray

Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions [Hellerstein, Rosell, Bach, Ray, Page]

• Algorithm to find a relevant variable large sample to detect relevant variable

• Uses examples from distributions Dp, for

p = 1/(r+1),2/(r+1),3/(r+1)…, (r+1)/(r+1)

• sample size poly((r+1) r, log n, log 1/δ)

[Essentially same algorithm found independently by Arpe and Mossel, using very different techniques]

• Another algorithm to find a relevant variable

• Based on proving (roughly) that if choose random p, then h2(p) likely to be reasonably large. Uses prime number theorem.

• Uses examples from poly(2r, log 1/ δ) distributions Dp.

• Sample size poly(2r, log n, log 1/ δ)

Better algorithms? large sample to detect relevant variable

Summary large sample to detect relevant variable

• Finding relevant variables (junta-learning)

• Correlation-immune functions

• Learning from p-biased distributions

Moral of the Story large sample to detect relevant variable

• Handbook of integer sequences can be useful in doing literature search

• Eating lunch with the right person can be much more useful