**Exact Learning of Boolean Functions with Queries** Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory, 2007

**I. Introduction**

**Learning Boolean formula f** Problem: Boolean formula f hidden in a black box. Told that f is from class C of formulas. Task: “Learn” f f(x1, x2, x3) = x1Λ x3

**Learning representation f of Boolean function** Problem: f hidden in a black box. Told that f is from class C of representations of Boolean functions. Task: “Learn” f f = x1 + x2 (mod 2)

**Boolean functions can represent** • Whether a person is good or bad • Whether an email message is spam • Whether tumor is malignant • Whether a book is a romance novel • etc.

**How hard is it to learn target f?** Need to specify: Type of information available What’s meant by “learning” Learning Models

**II. Learning Models**

**Valiant’s PAC Model (1984)** • PAC = Probably Approximately Correct • Type of info available: • Random examples: Value of f on “random” points in its domain • Success Criterion: • Approximate learning Output h that is approximately functionally equivalent to f

**Query Models (this talk)** • Type of info available: • Oracles that answer questions about f • Success Criterion: • Exact learning Output h where h ≡ f • Want to learn f within “polynomial” number of queries, in “polynomial” time • polynomial in n and size of f

**Types of queries** Membership queries (point evaluation) Question: What is f(x)? Answer: f(x) Equivalence queries Question: Is h ≡ f? h is hypothesis Answer: Yes if so, else x such that f(x) ≠ h(x) x is counterexample

**Definition: A membership and equivalence query algorithm** learns a class C of representations if given 1. Oracles to answer membership and equivalence queries for some f in C 2. The number n of variables of f the algorithm outputs a representation h s.t. f ≡ h Say algorithm runs in polynomial time if running time is poly(n, size of f)

**About membership and equivalence queries** • Assume queries answered perfectly • Membership queries • Black-box interpolation • Perfect answers often not available in practice • Equivalence queries • Can be simulated in PAC model • Test whether f(x) = h(x) on random examples x • Relation to mistake-bound learning in on-line model

**III. Example Query Algorithm**

**E.g. C = Boolean monomials** Boolean monomial = conjunction of literals f(x1, x2, x3) = ¬x1Λx3 = ¬x1x3 x1¬x2 x3 ¬x2

**Learning monomial f(x1, x2, x3) ** 1. Ask equiv. query: Is f ≡ 0? Suppose get counterexample f(1,0,1)=1 2. For each xi, determine if it appears in monomial f with negation, without negation, or not at all Since x1=1 in counterexample, x1 appears in f without negation, or not at all. Ask membership query: What is f(0, 0, 1)? If answer is 0, x1 without negation is in monomial f If answer is 1, x1 does not appear in f at all Do similarly for x2 and x3

**Learning Boolean monomials** • Previous approach learns Boolean monomials in n+1 equivalence and membership queries, polynomial time • Can also learn Boolean monomials with equivalence queries alone • Need exponential queries (worst-case) with membership queries alone. If monomial includes all n variables, f=1 for only one of the 2n points in its domain

**IV. Four interesting representation classes**

**1. DNF Formulas** OR of ANDs f = ¬x1 x2 x3V ¬x1x4V x1x2x3 • Natural way of describing classification rule • Not known whether DNF learnable in polynomial time with membership and equivalence queries (or in PAC model) • Best known algorithm runs in time

**2. Boolean linear threshold functions** f = 1 if x1 +x2 + x3 > 2 = 0 otherwise Learnable in polynomial time, equivalence q’s 3. Polynomials over GF[2] (integers mod 2) f = x2 x3+ x1 x3 + x1 x2 x3 Learnable in polynomial time, memb+equiv q’s

**4. Boolean decision trees** Learnable in polynomial time, memb+equiv q’s x1 =1 =0 x3 x2 =0 =1 =0 =1 1 0 0 1

**Representation and size** • Can represent every Boolean function as DNF formula, GF[2] polynomial, or decision tree • But sizes of representations can be very different • e.g. Parity function Representation as GF[2] polynomial is small f(x1,…,xn) = x1 + x2 + … + xn (mod 2) Requires DNF formula of size exponential in n Requires decision tree of size exponential in n

**V. Learning with Polynomial Number of Queries**

**Halving Algorithm** Generic algorithm for learning with poly number of queries • Assume (for simplicity) know size s of target f • Keep set V of all possible f Initially, V contains all representations in C (on n variables) of size s • Repeat until success: • Use V to construct Majority Hypothesis h • Ask equivalence query with h • Either “yes” (success), or receive counterexample. • If the latter, update V

**f6** f5 f1 V f8 f3 Majority Hypothesis h For each xin domain of f h (x) = 1 if majority of fi’s in V have fi(x) = 1 = 0 if majority of fi’s in V have fi(x) = 0 Counterexample to majority hypothesis eliminates at least half of fi’s in V Number of equivalence queries of Halving Algorithm is log2(Original size of V)

**VI. Challenge: Learn in polynomial time**

**If restrict hypotheses to be in C** • May be NP-hard to learn Computational hardness of learning • Tools to prove: Complexity theory, NP-completeness reductions, non-approximability • May require exponential number of queries to learn Informational hardness of learning • Tools to prove: Structural properties of C, combinatorial arguments

**Example** • Suppose C is class of 2-term DNF formulas and want to learn C with equivalence queries alone • NP-hard to learn 2-term DNF formulas with equivalence queries alone if hypotheses must be 2-term DNF formulas f = ¬x1 x3 V x1x2

**2-term DNF formulas can be factored** f = ¬x1 x3 V x1x2 = (¬x1 V x2)(x3 V x1) (x3 V x2) • Result is 2-CNF formula • AND of ORs in which each OR has at most 2 literals • Size of 2-CNF formula O(n2) • 2-CNF formulas can be learned in poly-time with equivalence queries alone (how?) • Learn 2-term DNF formula using algorithm for learning 2-CNF formulas.

**Learning 2-CNF ** 2-CNF formula f = (¬x1 V x2)(x3 V x1) (x3 V x2) Can be viewed as monomial over new variable set {y1, y2, …,} y1 = (¬x1 V x2) y2 = (x1 V x2) y3 = (x2 V ¬x3) etc. Learn 2-CNF formulas using algorithm for learning monomials by translating between original vars and new vars

**Two Useful Techniques** 1.To show C learnable, find C' s.t. • C' poly-time learnable • each f in C has equivalent f' in C' of size at most polynomially larger. • Learn C using algorithm for C’ 2. Use existing algorithm with new variable set

**BUT… ** Even if allow hypotheses not from C, can still be hard to learn C in polynomial time If C sufficiently rich class of Boolean circuits/formulas • Can show that C can represent cryptographic primitives • Learning C as hard as breaking cryptographic primitives

**VII. Learning GF[2] Polynomials and Decision Trees**

**GF[2] Polynomials and Decision Trees** • Poly-time learnable with membership and equivalence queries using algorithm for learning Hankel matrix representations (multiplicity automata) • Useful Technique 1 • Hankel matrix representations learnable using variant of Deterministic Finite Automaton learning algorithm

**Hankel Matrix H of f(x1,…,xn) ** View f as function on binary strings Rows/columns of H indexed by all binary strings. H[x,y] = f(x◦y) if |x|+|y|=n = 0 otherwise

**Hankel matrix of f(x1,x2) = x1 V x2** ε 0 1 00 01 10 11 111 ... ε 0 0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 000 01 1 10 1 111 …

**Learning Hankel matrices of Boolean functions** • Can represent Hankel matrix compactly • Suffices to specify particular O(r2) entries, where r is rank of matrix • Running time of Hankel matrix algorithm polynomial in r, n • Lemma: If f(x1,…,xn) is a GF[2] polynomial with s terms, then rank of its Hankel matrix is poly(n,s) • Lemma: If f(x1,…,xn) is a decision tree with s nodes, then rank of its Hankel matrix is poly(n,s) • Use Hankel matrix algorithm to learn GF[2] polynomials and decision trees

**VIII. Summary** • Definition of Query Learning Models • Halving algorithm for learning with polynomial number of equivalence queries • Techniques for polynomial-time learning • Examples of classes learnable in polynomial-time • Barriers to polynomial-time learning

**Selected References** • Learning Models • Valiant, L. G., A Theory of the Learnable. Communications of the ACM, 1984 • Angluin, D. Queries and concept learning. Machine Learning 2(4), 1988 • Learning Algorithms • Beimel, A., Bergadano, F., Bshouty, N. H., Kushilevitz, E., and Varricchio, S. Learning functions represented as multiplicity automata. Journal of the ACM (3), 2000 • Maass, W. and Turan, G. On the complexity of learning with counterexamples. Proc. of the 30th IEEE Symposium on Foundations of Computer Science (FOCS), 1989 • Klivans, A. and Servedio, R. Learning DNF in Time 2^{O(n^{1/3})}. Journal of Computer and System Sciences 68(2), 2004

**Hardness of Learning** • Kearns, M. J. and Valiant, L. G. Cryptographic limitations on learning Boolean formulae and finite automata. J. ACM (1), 1994 • Angluin, D. Negative results for equivalence queries. Machine Learning (5), 1990 • Hellerstein, L., Pillaipakkamnatt, K., Raghavan, V., and Wilkins, D. How many queries are needed to learn? J. ACM (43), 1996