# Discrepancy and SDPs - PowerPoint PPT Presentation

1 / 64

Discrepancy and SDPs. Nikhil Bansal (TU Eindhoven). Outline. Discrepancy: definitions and applications Basic results: upper/lower bounds Partial Coloring method (non-constructive) SDPs: basic method Algorithmic Spencer’s Result Lovett-Meka result Lower bounds via SDP duality (Matousek).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Discrepancy and SDPs

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Discrepancy and SDPs

Nikhil Bansal (TU Eindhoven)

### Outline

Discrepancy: definitions and applications

Basic results: upper/lower bounds

Partial Coloring method (non-constructive)

SDPs: basic method

Algorithmic Spencer’s Result

Lovett-Meka result

Lower bounds via SDP duality (Matousek)

### Material

Classic: Geometric Discrepancy by J. Matousek

Papers:

Bansal. Constructive algorithms for discrepancy minimization, FOCS 2010

Matousek. The determinant lower bound is almost tight

Lovett, Meka. Discrepancy minimization by walking on the edges

Survey with fewer technical details:

Bansal. …

### Discrepancy: What is it?

Study of gaps in approximating the continuous by the discrete.

Original motivation: Numerical Integration/ Sampling

Problem: How well can you approximate a region by discrete points

Discrepancy:

Max over intervals I

|(# points in I) – (length of I)|

### Discrepancy: What is it?

Study of gaps in approximating the continuous by the discrete.

Problem: How uniformly can you distribute points in a grid.

“Uniform” : For every axis-parallel rectangle R

| (# points in R) - (Area of R) | should be low.

Discrepancy:

Max over rectangles R

|(# points in R) – (Area of R)|

n1/2

n1/2

### Distributing points in a grid

Problem: How uniformly can you distribute points in a grid.

“Uniform” : For every axis-parallel rectangle R

| (# points in R) - (Area of R) | should be low.

n= 64

points

Van der Corput Set

Uniform

Random

n1/2 discrepancy

n1/2 (loglog n)1/2

O(log n) discrepancy!

### Quasi-Monte Carlo Methods

With N random samples: Error \prop 1/\sqrt{n}

Quasi-Monte Carlo Methods: \prop Disc/n

Can discrepancy be O(1) for 2d grid?

No. \Omega(log n) [Schmidt …]

d-dimensions: O(log^{d-1} n) [Halton-Hammersely ]

\Omega(log^{(d-1)/2} n) [Roth ]

\Omega(log^{(d-1)/2 + \eta} n [Bilyk,Lacey,Vagharshakyan’08]

### Discrepancy: Example 2

Input: n points placed arbitrarily in a grid.

Color them red/blue such that each rectangle is colored as evenly as possible

Discrepancy: max over rect. R ( | # red in R - # blue in R | )

Continuous: Color each element

1/2 red and 1/2 blue (0 discrepancy)

Discrete:

Random has about O(n1/2 log1/2 n)

Can achieve O(log2.5 n)

S3

S4

S1

S2

### Combinatorial Discrepancy

Universe: U= [1,…,n]

Subsets: S1,S2,…,Sm

Color elements red/blue so each

set is colored as evenly as possible.

Find : [n] ! {-1,+1} to

Minimize |(S)|1 = maxS | i 2 S(i) |

If A is m \times n incidence matrix.

Disc(A) = min_{x \in {-1,1}^n} |Ax|_\infty

### Applications

CS: Computational Geometry, Comb. Optimization, Monte-Carlo simulation, Machine learning, Complexity, Pseudo-Randomness, …

Math: Dynamical Systems, Combinatorics, Mathematical Finance,

Number Theory, Ramsey Theory, Algebra, Measure Theory, …

### Rounding

Lovasz-Spencer-Vesztermgombi’86

Given any matrix A, and x \in R^n

can round x to \tilde{x} \in Z^n s.t.

|Ax – A\tilde{x}|_\infty < Herdisc(A)

Proof: Round the bits one by one.

Can we find it efficiently?

Nothing known until recently.

Thm [B’10]. Can efficiently round so that

Error \leq O(\sqrt{log m log n}) Herdisc(A)

### More rounding approaches

Bin Packing

Refined further by Rothvoss(Entropy rounding method)

### Dynamic Data Structures

N points in a 2-d region.

Weights update over time.

Query: Given an axis-parallel rectangle R, determine the total weight on points in R.

Preprocess:

• Low query time

• Low update time (upon weight change)

### Example

Line:

Query = O(n) Update = 1

Query = 1 Update = O(n^2)

Query = 2 Update = O(n)

Query = O(log n) Update = O(log n)

Recursively can get for 2-d.

Query

Circles arbitrary rectangles aligned triangle

Turns out t_q t_u \geq n^{1/2}/log^2 n ?

Larsen-Green: t_q t_u \geq disc(S)^n/log^2 n

### Sketch of idea

A good data structure implies

D = A P

A = row sparse P = Column sparse

(low query time) (low update time)

### Best Known Algorithm

Random: Color each element i independently as

x(i) = +1 or -1 with probability ½ each.

Thm: Discrepancy = O (n log n)1/2

Pf: For each set, expect O(n1/2) discrepancy

Standard tail bounds: Pr[ | i 2 S x(i) | ¸c n1/2 ] ¼e-c2

Union bound + Choose c ¼ (log n)1/2

Analysis tight: Random actually incurs ((n log n)1/2).

### Better Colorings Exist!

[Spencer 85]: (Six standard deviations suffice)

Always exists coloring with discrepancy ·6n1/2

(In general for arbitrary m, discrepancy = O(n1/2log(m/n)1/2)

Tight: For m=n, cannot beat 0.5 n1/2 (Hadamard Matrix, “orthogonal” sets)

Inherently non-constructive proof

(pigeonhole principle on exponentially large universe)

Challenge: Can we find it algorithmically ?

Certain algorithms do not work [Spencer]

Conjecture[Alon-Spencer]: May not be possible.

S3

S4

S1

S2

### Beck Fiala Thm

U = [1,…,n] Sets: S1,S2,…,Sm

Suppose each element lies in at most t sets (t << n).

[Beck Fiala’ 81]: Discrepancy 2t -1.

(elegant linear algebraic argument, algorithmic result)

Beck Fiala Conjecture: O(t1/2) discrepancy possible

Other results: O( t1/2 log t log n ) [Beck]

O( t1/2 log n ) [Srinivasan]

O( t1/2 log1/2 n ) [Banaszczyk]

Non-constructive

1 2 … n

1’ 2’ … n’

S1

S2

S’1

S’2

### Approximating Discrepancy

Question: If a set system has low discrepancy (say << n1/2)

Can we find a good discrepancy coloring ?

[Charikar, Newman, Nikolov 11]:

Even 0 vs. O (n1/2) is NP-Hard

(Matousek): What if system has low Hereditary discrepancy?

herdisc (U,S) = maxU’ ½ U disc (U’, S|U’)

Robust measure of discrepancy (often same as discrepancy)

Widely used: TU set systems, Geomety, …

### Our Results

Thm 1: Can get Spencer’s bound constructively.

That is, O(n1/2) discrepancy for m=n sets.

Thm 2: If each element lies in at most t sets, get bound of O(t1/2 log n) constructively (Srinivasan’s bound)

Thm 3: For any set system, can find

Discrepancy ·O(log (mn))Hereditary discrepancy.

Other Problems: Constructive bounds (matching current best)

k-permutation problem [Spencer, Srinivasan,Tetali]

Geometric problems , …

### Relaxations: LPs and SDPs

Not clear how to use.

Linear Program is useless. Can color each element ½ red and ½ blue. Discrepancy of each set = 0!

SDPs(LP on vi¢ vj, cannot control dimension of v’s)

| i 2 S vi |2· n 8 S

|vi|2 = 1

Intended solution vi = (+1,0,…,0) or (-1,0,…,0).

Trivially feasible: vi = ei (all vi’s orthogonal)

Yet, SDPs will be a major tool.

### Punch line

SDP very helpful if “tighter” bounds needed for some sets.

|i 2 S vi |2· 2 n

| i 2 S’ vi|2· n/log n

|vi|2· 1

Not apriori clear why one can do this.

Entropy Method.

Algorithm will construct coloring over time and

use several SDPs in the process.

Tighter bound for S’

### Talk Outline

Introduction

The Method

Low Hereditary discrepancy -> Good coloring

Spencer’s O(n1/2) bound

-n

n

### Slight improvement

Can be improved to O(\sqrt{n})/2^n

If you pick a random {-1,1} coloring s

w.p. say >= ½ |a \cdot s| \leq c \sqrt{n}

2^{n-1} colorings s, with |a\cdot s| \leq c \sqrt{n}

### Algorithmically

Easy: 1/poly(n) (How?)

[Karmarkar-Karp’81]: \approx 1/n^log n

Huge gap: Major open question

Remark: {-1,+1} not enough. Really need color 0 also.

E.g. a_1 = 1, a_2=…=a_n = 1/(2n)

### Yet another enhancement

There is a {-1,0,1} coloring with at least

n/2 {-1,1}’s s.t. \sum_i a_i s_i \leq n/2^{n/5}

Make buckets of size 2n/2^{n/5}

At least 2^{4n/5} sums fall in same bucket

Claim: Some two s’ and s’’ in same bucket and differ in at least n/2 coordinates

Again consider s = (s’-s’’)/2

### Proof of Claim

Claim: Any set of 2^{4n/5} vertices of the

boolean cube has

[Kleitman’66] Isoperimetry for cube.

Hamming ball B(v,r) has the smallest diameter for a given number of vertices.

|B(v,n/4)| < 2^{4n/5}

start

finish

### Algorithm (at high level)

Each dimension: An Element

Each vertex: A Coloring

Cube: {-1,+1}n

Algorithm: “Sticky” random walk

Each step generated by rounding a suitable SDP Move in various dimensions correlated, e.g. t1 + t2¼ 0

Analysis: Few steps to reach a vertex (walk has high variance)

Disc(Si) does a random walk (with low variance)

### An SDP

Hereditary disc. ) the following SDP is feasible

SDP:

Low discrepancy: |i 2 Sj vi |2 ·2

|vi|2 = 1

Obtain vi2 Rn

Rounding:

Pick random Gaussian g = (g1,g2,…,gn)

each coordinate gi is iid N(0,1)

For each i, consider i = g¢ vi

### Properties of Rounding

Lemma: If g 2 Rn is random Gaussian. For any v 2 Rn,

g ¢ v is distributed as N(0, |v|2)

Pf: N(0,a2) + N(0,b2) = N(0,a2+b2) g¢ v = i v(i) gi» N(0, i v(i)2)

Recall: i = g ¢ vi

• Each i» N(0,1)

• For each set S,

• i 2 Si = g ¢ (i2 S vi) » N(0, ·2)

• (std deviation ·)

SDP:

|vi|2 = 1

|i2S vi|2·2

’s mimics a low discrepancy coloring (but is not {-1,+1})

+1

time

-1

### Algorithm Overview

Construct coloring iteratively.

Initially: Start with coloring x0 = (0,0,0, …,0) at t = 0.

At Time t: Update coloring as xt = xt-1 +  (t1,…,tn)

( tiny: 1/n suffices)

xt(i) = (1i + 2i + … + ti)

Color of element i: Does random walk

over time with step size ¼ N(0,1)

x(i)

Fixed if reaches -1 or +1.

Set S: xt(S) = i 2 S xt(i) does a random walk w/ step N(0,·2)

### Analysis

Consider time T = O(1/2)

Claim 1: With prob. ½, at least n/2 elements reach -1 or +1.

Pf: Each element doing random walk with size ¼.

Recall: Random walk with step 1, is ¼ O(t1/2) away in t steps.

A Trouble: Various element updates are correlated

Consider basic walk x(t+1) = x(t) 1 with prob ½

Define Energy (t) = x(t)2

E[(t+1)] = ½ (x(t)+1)2 + ½ (x(t)-1)2 = x(t)2 + 1 = (t)+1

Expected energy = n at t= n.

Claim 2: Each set has O() discrepancy in expectation.

Pf: For each S, xt(S) doing random walk with step size ¼

### Analysis

Consider time T = O(1/2)

Claim 1: With prob. ½, at least n/2 variables reach -1 or +1.

) Everything colored in O(log n) rounds.

Claim 2: Each set has O() discrepancy in expectation per round.

) Expected discrepancy of a set at end = O( log n)

Thm: Obtain a coloring with discrepancy O( log (mn))

Pf: By Chernoff, Prob. that disc(S) >= 2 Expectation + O( log m)

= O( log (mn))

is tiny (poly(1/m)).

### Recap

At each step of walk, formulate SDP on unfixed variables.

Use some (existential) property to argue SDP is feasible

Rounding SDP solution -> Step of walk

Properties of walk:

High Variance -> Quick convergence

Low variance for discrepancy on sets -> Low discrepancy

### Refinements

Spencer’s six std deviations result:

Goal: Obtain O(n1/2) discrepancy for any set system on m = O(n) sets.

Random coloring has n1/2(log n)1/2 discrepancy

Previous approach seems useless:

Expected discrepancy for a set O(n1/2),

but some random walks will deviate by up to (log n)1/2 factor

Need an additional idea to prevent this.

### Spencer’s O(n1/2) result

Partial Coloring Lemma: For any system with m sets, there exists a coloring on ¸ n/2 elements with discrepancy O(n1/2 log1/2 (2m/n))

[For m=n, disc = O(n1/2)]

Algorithm for total coloring:

Repeatedly apply partial coloring lemma

Total discrepancy

O( n1/2 log1/2 2 ) [Phase 1]

+ O( (n/2)1/2 log1/2 4 ) [Phase 2]

+ O((n/4)1/2 log1/2 8 ) [Phase 3]

+ … = O(n1/2)

X1 = ( 1,-1, 1 , …,1,-1,-1)

X2 = (-1,-1,-1, …,1, 1, 1)

X = ( 1, 0, 1 , …,0,-1,-1)

### Proving Partial Coloring Lemma

Beautiful Counting argument (entropy method + pigeonhole)

Idea: Too many colorings (2n), but few “discrepancy profiles”

Key Lemma: There exist k=24n/5 colorings X1,…,Xk such that

every two Xi, Xj are “similar” for every set S1,…,Sn.

Some X1,X2 differ on ¸ n/2 positions

Consider X = (X1 – X2)/2

Pf: X(S) = (X1(S) – X2(S))/2 2 [-10 n1/2 , 10 n1/2]

### A useful generalization

There exists a partial coloring with non-uniform discrepancy bound S for set S

Even if S = ( n1/2) in some average sense

### An SDP

Suppose there exists partial coloring X:

1. On ¸ n/2 elements

2. Each set S has |X(S)| ·S

SDP:

Low discrepancy: |i 2 Sj vi |2·S2

Many colors:i |vi|2¸ n/2

|vi|2· 1

Pick random Gaussian g = (g1,g2,…,gn)

each coordinate gi is iid N(0,1)

For each i, consider i = g ¢ vi

Obtain vi2 Rn

### Algorithm

Initially write SDP with S = c n1/2

Each set S does random walk and expects to reach

discrepancy of O(DS) = O(n1/2)

Some sets will become problematic.

Reduce their S on the fly.

Not many problematic sets, and entropy penalty low.

Danger 3 …

Danger 1

Danger 2

35n1/2

0

30n1/2

20n1/2

### Concluding Remarks

Construct coloring over time by solving sequence of SDPs (guided by existence results)

Works quite generally

Can be derandomized[Bansal-Spencer]

(use entropy method itself for derandomizing + usual tech.)

E.g. Deterministic six standard deviations can be viewed as a way to derandomize something stronger than Chernoff bounds.

### Rest of the talk

• How to generate i with required properties.

• How to update S over time.

Show n1/2 (log log log n)1/2 bound.

### Why so few algorithms?

• Often algorithms rely on continuous relaxations.

• Linear Program is useless. Can color each element ½ red and ½ blue.

• Improved results of Spencer, Beck, Srinivasan, … based on clever counting (entropy method).

• Pigeonhole Principle on exponentially large systems (seems inherently non-constructive)

### Partial Coloring Lemma

Suppose we have discrepancy bound S for set S.

Consider 2n possible colorings

Signature of a coloring X: (b(S1), b(S2),…, b(Sm))

Want partial coloring with signature (0,0,0,…,0)

### Progress Condition

Energy increases at each step:

E(t) = \sum_i x_i(t)^2

Initially energy =0, can be at most n.

Expected value of E(t) = E(t-1) + \sum_i \gamma_i(t)^2

Markov’s inequality.

### Missing Steps

• How to generate the \eta_i

• How to update \Delta_S over time

### Partial Coloring

X1 = (1,-1, 1 , …, 1,-1,-1)

X2 = (-1,-1,-1, …, 1,1, 1)

If exist two colorings X1,X2

1. Same signature (b1,b2,…,bm)

2. Differ in at least n/2 positions.

Consider X = (X1 –X2)/2

• -1 or 1 on at least n/2 positions, i.e. partial coloring

• Has signature (0,0,0,…,0)

X(S) = (X1(S) – X2(S)) / 2, so |X(S)| ·S for all S.

Can show that there are 24n/5 colorings with same signature.

So, some two will differ on > n/2 positions. (Pigeon Hole)

### Spencer’s O(n1/2) result

Partial Coloring Lemma: For any system with m sets,

there exists a coloring on ¸ n/2 elements with discrepancy O(n1/2 log1/2 (2m/n))

[For m=n, disc = O(n1/2)]

Algorithm for total coloring:

Repeatedly apply partial coloring lemma

Total discrepancy

O( n1/2 log1/2 2 ) [Phase 1]

+ O( (n/2)1/2 log1/2 4 ) [Phase 2]

+ O((n/4)1/2 log1/2 8 ) [Phase 3]

+ … = O(n1/2)

Let us prove the lemma for m = n

Ent(b1) · 1/5

### Proving Partial Coloring Lemma

-30 n1/2

-10 n1/2

10 n1/2

30 n1/2

-2

-1

0

1

2

Pf: Associate with coloring X, signature = (b1,b2,…,bn)

(bi = bucket in which X(Si) lies )

Wish to show: There exist 24n/5 colorings with same signature

Choose X randomly: Induces distribution  on signatures.

Entropy () · n/5 implies some signature has prob. ¸ 2-n/5.

Entropy ( ) ·i Entropy( bi) [Subadditivity of Entropy]

bi = 0 w.p. ¼ 1- 2 e-50,

= 1 w.p. ¼ e-50

= 2 w.p. ¼ e-450

….

For each set S, consider the “bucketing”

-2

-1

2

0

1

S

-3S

-S

3S

5S

Bucket of n1/2/100

has penalty ¼ ln(100)

### A useful generalization

Partial coloring with non-uniform discrepancy S for set S

Suffices to have s Ent (bs) · n/5

Or, if S = s n1/2 , then s g(s) · n/5

g() ¼ e-2/2 > 1

¼ln(1/) < 1

### Recap

Partial Coloring:S¼ 10 n1/2 gives low entropy

) 24n/5 colorings exist with same signature.

) some X1,X2 with large hamming distance.

(X1 – X2) /2 gives the desired partial coloring.

Trouble: 24n/5/2n is an exponentially small fraction.

Only if we could find the partial coloring efficiently…