Linear sketching for Functions over the Boolean Hypercube

Linear sketching for Functions over the Boolean Hypercube GrigoryYaroslavtsev (Director @ Center for Algorithms and ML, Indiana University, Bloomington) http://grigory.us/blog ⬇ with Sampath Kannan (U. of Pennsylvania) ElchananMossel(Penn MIT) SwagatoSanyal (NUS IIT Kharagpur)

“Three Schools of Thought” in Algorithms & Complexity • Boston (MIT & Harvard) • Youthful & innovative attacks on problems,driven by PhD students with new ideas (“grad student descent”) • “Relentless optimism ;)”: faster algorithms, e.g. sublinear time, gradient descent, deep learning

“Three Schools of Thought” in Algorithms & Complexity • New York & Chicago (Princeton, NYU, U Chicago) • Abstract and skeptical theory building, driven by fundamental questions • “Life is hard…”: polynomial-time, hardness of approximation, lower bounds, beyond-worst case analysis

“Three Schools of Thought” in Algorithms & Complexity • Bay area (Stanford & Berkeley) • No time for philosophy, driven by applications and societal needs • “Let’s start a company and change the society!”: machine learning/AI, fairness, social networks, privacy

This talk • “Boston school” • Fast, optimistic and specific: sublinear time, streaming, distributed • “New York & Chicago school” • General theory, lower bounds, beyond-worst case analysis ⬇ ⬇ ⬇

-Sketching • Input • Parity = Linear function over: • Deterministic -sketch: set of parities: • ;;; • Randomized -sketch: distribution over parities (random ):

-Sketching • Known • Question: Can one recover from a small () linear sketch over • Allow randomized computation (99% success) • Probability over choice of random sets • Sets are known at recovery time • Recovery is deterministic (w.l.o.g)

Puzzle: Open Problem 78 on Sublinear.info Shared randomness Alice: Bob: • Conjecture: (Almost) shortest message is a randomized -sketch • https://sublinear.info/index.php?title=Open_Problems:78 • Ask me later: variants of this conjecture for simultaneous communication, multi-player, product distributions, etc

Motivation: Distributed Computing • Distributed computation among machines: • (more generally ) • machines can compute sketches locally: • Send them to the coordinator who computes: (coordinate-wise XORs) • Coordinator computes with communication

Motivation: Streaming • generated through a sequence of updates • Updates : update flips bit at position Updates: (1, 3, 8, 3) allows to recover with bits of space

Frequently Asked Questions • Q: Why updates instead of ? • No known cases when it helps to know the sign • Q: Some applications? • Essentially all dynamic graph streaming algorithms can be based on -sampling • -sampling can be done optimally using -sketching [Kapralov et al. FOCS’17] • Q: Why not allow approximation? • Stay tuned [Y’17]

Deterministic vs. Randomized • Fact: has a deterministic sketch if and only if • Equivalent to“ has Fourier dimension • Randomization can help: • OR • Has “Fourier dimension” • Pick random sets • If there is such that output 1, otherwise output 0 • Error probability

Fourier Analysis • Notation switch: • Functions as vectors form a vector space: • Inner product on functions = “correlation”: (for Boolean only)

“Main Characters” are Parities • For let character • Fact: Every function is uniquely represented as a multilinear polynomial • a.k.a. Fourier coefficient of on • (Parseval)

Fourier Dimension • Fourier sets ≡ vectors in • “ has Fourier dimension = a -dimensional subspace in Fourier domain has all weight • Pick a basis in : • Sketch: • For every there exists :

Deterministic Sketching and Noise Suppose “noise” has bounded norm -dimensional “ noise” • Sparse Fourier noise (via [Sanyal’15]) • -dimensional “-noise” • # non-zero Fourier coefficients of noise (aka “Fourier sparsity”) • Linear sketch size: • Our work: can’t be improved even with randomness and even for uniform , e.g for ``addressing function’’.

How Randomization Handles Noise • -noise in original domain (via hashing a la OR) • -dim. “-noise” • -sketch size: • Optimal (but only existentially, i.e. ) • -noise in the Fourier domain (via [Bruck, Smolensky ‘92; Grolmusz’97]) • -dim. “-noise” • -sketch size: • Example = -dim. small decision tree / DNF / etc.

Randomized Sketching: Hardness • Not -concentrated on -dim. Fourier subspaces • For -dim. Fourier subspace : • Thm. Any -dim. -sketch makes error • Converse doesn’t hold, i.e. concentration is not enough

Randomized Sketching: Hardness • Not -concentrated on -dim. Fourier subspaces: • Almost all symmetric functions, i.e. • If not Fourier-close to constant or • E.g. Majority (not an extractor even for • Tribes (balanced DNF) • Recursive majority:

Approximate Fourier Dimension • Not -concentrated on -dim. Fourier subspaces • -dim. Fourier subspace : • Any -dim. linear sketch makes error ½() • Definition (Approximate Fourier Dimension) • smallest such that is -concentrated on some Fourier subspace of dimension

Sketching over Uniform Distribution + Approximate Fourier Dimension • Sketching error overuniform distribution of • -dimensional sketch gives error • Fix -dimensional : • Output: error • We show a basic refinement error • Pick from a carefully chosen distribution • Output: 1 -1 +1

1-way Communication Complexity of XOR-functions Shared randomness Alice: Bob: • Examples: • (z) = : (not) Equality • (z) = Hamming Dist > d • = min.|M| so that Bob’s error prob.

Communication Complexity of XOR-functions • Well-studied (often for 2-way communication): • [Montanaro,Osborne], ArXiv’09 • [Shi, Zhang], QIC’09, • [Tsang, Wong, Xie, Zhang], FOCS’13 • [O’Donnell, Wright, Zhao,Sun,Tan], CCC’14 • [Hatami, Hosseini, Lovett], FOCS’16 • Connections to log-rank conjecture [Lovett’14]: • Even special case for XOR-functions still open

Deterministic 1-way Communication Complexity of XOR-functions • = min.|M| so that Bob is always correct • [Montanaro-Osborne’09]: • deterministic -sketch complexity of • Fourier dimension of Bob: Alice:

1-way Communication Complexity of XOR-functions Shared randomness Alice: Bob: • = min.|M| so that Bob’s error prob. • rand. -sketch complexity (error ) • Conjecture: ?

? As we show holds for: • Majority, Tribes, recursive majority, addressing function • (Almost all) symmetric functions • Degree--polynomials: Analogous question for 2-way is wide open: [HHL’16] ?

Distributional 1-way Communication under Uniform Distribution Bob: • = min.|M| so that Bob’s error prob. is over the uniform distribution over • Enough to consider deterministic messages only • Motivation: streaming/distributed with random input Alice:

Communication for Uniform Distribution • Optimal up to constant factors in dimension and error • -dim. linear sketch has error

Application: Random Streams • generated via a stream of updates • Each update flips a uniformly random coordinate • Goal: maintain during the stream (error prob. ) • Question: how much space necessary? • Answer: and best algorithm is -sketch • After first O() updates input is uniform • Big open question: • Is the same true if is not uniform? • True for VERY LONG () streams (via [LNW’14]) • How about short ones? • Answer would follow from our conjecture if true

Approximate-Sketching[Y.’17] • Normalize: • Question: Can one compute: 𝔼from a small () linear sketch over

Approximate-Sketching[Y.’17] Interesting facts: • All results under the uniform distribution generalize directly to approximate sketching • -sampling has optimal dependence on parameters: • Optimal dependence: • Open problem: Is-sampling optimal for Boolean functions?

-Sketching of Valuation Functions[Y.’17] • Additive (): • (optimal via weighted Gap Hamming) • Budget-additive (): • Coverage: • Optimal (via -Sampling) • Matroid rank • -Lipschitz submodular functions: • communication lower bound for • Uses a large family of matroids from [Balcan, Harvey’10]

Thanks! Questions? • Other stuff [Karpov, Y.]: • Linear Threshold Functions: • Resolves a communication conjecture of [MO’09] • Simple neural nets: LTF(ORs), LTF(LTFs) • Blog post: http://grigory.us/blog/the-binary-sketchman

and Approximate Fourier Dimension Thm: If then: where =

and Approximate Fourier Dimension • average “rectangle” size = • A subspace distinguishes andif: : • Lem 1: Fix a-dim. subspace typical and in a typical “rectangle” are distinguished by • Lem 2: If a-dim. subspace distinguishes and + 1) is -concentrated on 2)is not -concentrated on any ()-dim. subspace

and Approximate Fourier Dimension Thm: If then: Where Error for fixed = min([],]) Average error for () = “typical rectangle”

Example: Majority • Majority function: • only depends on • if is odd • -dimensional subspace with most weight: • Set ,

Linear sketching for Functions over the Boolean Hypercube

Linear sketching for Functions over the Boolean Hypercube

Presentation Transcript

Linear Functions

Linear Functions

Sketching Functions

Sketching quadratic functions

Rational Functions: Sketching Graphs

Sketching Rational Functions

Linear Functions

Linear Functions

Linear Functions

Linear Functions

Linear Functions

Linear functions

Basic Boolean Functions

Linear Functions

Hypercube

11.1 Boolean Functions

Linear Functions

Linear Functions

Linear Functions

Linear Functions

Boolean Functions

Linear functions Functions in general Linear functions Linear (in)equalities