Learning Submodular Functions

1 / 36

# Learning Submodular Functions - PowerPoint PPT Presentation

Learning Submodular Functions. Maria Florina Balcan. LGO, 11/16/2010. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A. Submodular functions. V={1,2, …, n}; set-function f : 2 V ! R.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Learning Submodular Functions' - nenet

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Learning Submodular Functions

Maria FlorinaBalcan

LGO, 11/16/2010

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAAAA

Submodular functions

V={1,2, …, n}; set-function f : 2V! R

• Concave Functions Let h : R!R be concave. For each S µ V, let f(S) = h(|S|)

f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV

• Decreasing marginal return

f(T [ {x})-f(T)¸ f(S [ {x})-f(S), 8 S,TµV, Tµ S, x not in T

Examples:

• Vector Spaces Let V={v1,,vn}, each vi2Fn.For each S µ V, let f(S) = rank(V[S])

S[T

SÅT

Submodular set functions
• Set function f on V is called submodular if

For all S,T µ V: f(S)+f(T) ¸ f(S[T)+f(SÅT)

• Equivalent diminishing returns characterization:

+

¸

+

T

S

+

x

S

Large improvement

Submodularity:

T

+

x

Small improvement

For TµS, xS, f(T [{x}) – f(T) ¸f(S [{x}) – f(S)

Example: Set cover

Want to cover floorplan with discs

Place sensorsin building

Possiblelocations V

For S µ V: f(S) = “area (# locations) covered by sensors placed at S”

Node predicts

values of positions

Formally:

W finite set, collection of n subsets Wiµ W

For S µ V={1,…,n} define f(S) = |i2S Wi|

x

x

Set cover is submodular

T={W1,W2}

W1

W2

f(T[{x})-f(T)

¸

f(S[{x})-f(S)

W1

W2

W3

W4

S = {W1,W2,W3,W4}

Submodular functions

V={1,2, …, n}; set-function f : 2V! R

• Concave Functions Let h : R!R be concave. For each S µ V, let f(S) = h(|S|)

f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV

• Decreasing marginal return

f(T [ {x})-f(T)· f(S [ {x})-f(S), 8 S,TµV, S µ T, x not in T

Examples:

• Vector Spaces Let V={v1,,vn}, each vi2Fn.For each S µ V, let f(S) = rank(V[S])
Submodular functions

V={1,2, …, n}; set-function f : 2V! R

f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV

Monotone:

f(S) · f(T) , 8 S µ T

Non-negative:

f(S) ¸ 0, 8 S µ V

Submodular functions
• A lot of work on optimization and submodularity.
• Can be minimized in polynomial time.
• Algorithmic game theory
• decreasing marginal utilities.
• Substantial interest in the ML community recently.
• Tutorials, workshops at ICML, NIPS, etc.
• www.submodularity.org/ owned by ML.
Learnability of Submodular Fns
• Important to also understand their learnability.
• Exact learning with value queries
• Previous Work:

Goemans, Harvey, Iwata, Mirrokni, SODA 2009

• [GHIM’09] Model
• There is an unknown submodular target function.
• Algorithm allowed to (adaptively) pick sets and query the value of the target on those sets.
• Can we learn the target with a polynomial number of queries in poly time?
• Output a function that approximates the target within a factor of ® on every single subset.
Exact learning with value queries

Goemans, Harvey, Iwata, Mirrokni, SODA 2009

• Theorem: (General upperbound)

9 an alg. for learning a submodular function with an approx. factor O(n1/2).

• Theorem: (General lower bound)
• Any alg. for learning a submodular must have an approx. factor of (n1/2).
Problems with the GHIM model
• - Lower bound fails if our goal is to do well on most of the points.
• - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries
• - Well known that value queries are undesirable in some learning applications.

Is there a better model that gets around these problems?

Problems with the GHIM model
• - Lower bound fails if our goal is to do well on most of the points.
• - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries
• - Well known that value queries are undesirable in some learning applications.

Learning submodularfns in a distributional learning setting [BH10]

Our model: Passive Supervised Learning

Data Source

Distribution D on {0,1}n

Expert / Oracle

Learning Algorithm

Labeled Examples

(x1,f(x1)),…, (xk,f(xk))

f : {0,1}n R+

Alg.outputs

g : {0,1}n R+

Our model: Passive Supervised Learning

Data Source

Distribution D on {0,1}n

Expert / Oracle

Learning Algorithm

Labeled Examples

(x1,f(x1)),…, (xk,f(xk))

• Algorithm sees (x1,f(x1)),…, (xk,f(xk)),xii.i.d. from D

f : {0,1}n R+

• Algorithm produces “hypothesis” g. (Hopefully g ¼f)
• Prx1,…,xm[ Prx[g(x)·f(x)·®g(x)]¸1-²] ¸1-±

Alg.outputs

g : {0,1}n R+

• “Probably MostlyApproximatelyCorrect”
Main results
• Theorem: (Our general upper bound)

9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2).

• Note:
• Much simpler alg. compared to GIHM’09
• Theorem: (Our general lower bound)
• No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factorõ(n1/3).
• Note:
• The GIHM’09 lower bound fails in our model.
• Theorem: (Product distributions)

Matroid rank functions, const. approx.

A General Upper Bound
• Theorem:

9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2).

Subaddtive Fns are Approximately Linear
• Let f be non-negative, monotone and subadditive
• Claim:f can be approximated to within factor nby a linear functiong.
• Proof Sketch: Let g(S) = s in S f({s}).Then f(S)·g(S) ·n ¢f(S).

Subadditive: f(S)+f(T) ¸ f(S[ T) 8 S,T µ V

Monotonicity: f(S) · f(T) 8 Sµ T

Non-negativity: f(S) ¸ 0 8 S µ V

Subaddtive Fns are Approximately Linear
• f(S) ·g(S) ·n¢f(S).

n¢f

g

f

V

PMAC Learning Subadditive Fns
• fnon-negative, monotone,subadditiveapproximated to within factor nby a linear functiong,
• g (S) =w¢Â (S).
• Sample S from D; flip a coin.
• Labeled examples((Â(S), f(S) ), +) and ((Â(S), n¢f(S) ), -) are linearly separable inRn+1.
• Idea: learn a linear separator. Use std sample complex.
• Problem: data noti.i.d.
• Solution: create a related distribution.
• If heads add ((Â(S), f(S) ), +).
• Else add ((Â(S), n¢f(S) ), -).
PMAC Learning Subadditive Fns
• Algorithm:
• Note:
• Deal with the set {S:f(S)=0 } separately.

Input: (S1, f(S1)) …, (Sm, f(Sm))

• For each Si, flip a coin.
• If heads add ((Â(S), f(Si) ), +).
• Else add ((Â(S), n¢f(Si) ), -).
• Learn a linear separator u=(w,-z) in Rn+1.
• Output:g(S)=1/(n+1) w ¢Â (S).
• Theorem: For m = £(n/²), g approximates f to within a factor n on a 1-² fraction of the distribution.
PMAC Learning Submodular Fns
• Algorithm:
• Note:
• Deal with the set {S:f(S)=0 } separately.

Input: (S1, f(S1)) …, (Sm, f(Sm))

• For each Si, flip a coin.
• If heads add ((Â(S), f2(S_i)) ), +).
• Else add ((Â(S), n f2(S_i) ), -).
• Learn a linear separator u=(w,-z) in Rn+1.
• Output:g(S)=1/(n+1)1/2 w ¢Â (S)
• Theorem: For m = £(n/²), g approximates f to within a factor \sqrt{n} on a 1-² fraction of the distribution.

Proof idea: f non-negative, monotone, submodular approximated to within factor \sqrt{n} by a \sqrt{linear function}.

[GHIM, 09]

PMAC Learning Submodular Fns
• Algorithm:
• Note:
• Deal with the set {S:f(S)=0 } separately.

Input: (S1, f(S1)) …, (Sm, f(Sm))

• For each Si, flip a coin.
• If heads add ((Â(S), f2(S_i)) ), +).
• Else add ((Â(S), n f2(S_i) ), -).
• Learn a linear separator u=(w,-z) in Rn+1.
• Output:g(S)=1/(n+1)1/2 w ¢Â (S)
• Much simpler than [GIHM09]. More robust to variations.
• the target only needs to be within an ¯ factor of a submodularfnc.
• 9 a submodularfnc that agrees with target on all but a ´ fraction of the points (on the points it disagrees it can be arbitrarily far).
• [the alg is inefficient in this case]
A General Lower Bound
• Theorem: (Our general lower bound)
• No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factorõ(n1/3).

Plan:

Use the fact that any matroid rank fnc is submodular.

Construct a hard family of matroid rank functions.

High=n1/3

X

X

L=nlog log n

X

X

Low=log2n

A1

AL

A2

A3

… … …. ….

Partition Matroids

A1, A2, …, Akµ V={1,2, …, n}, all disjoint;ui· |Ai|-1

• E.g., n=5, A1={1,2,3}, A2={3,4,5}, u1=u2=2.

Ind={I: |I ÅAj| ·uj, for all j }

Then (V, Ind) is a matroid.

If sets Ai are not disjoint, then (V,Ind) might not be a matroid.

• {1,2,4,5} and {2,3,4} both maximal sets in Ind; do not have the same cardinality.
Almost partition matroids

k=2, A1, A2µ V (not necessarily disjoint); ui· |Ai|-1

Ind={I: |I ÅAj| ·uj , |I Å (A1[A2)| ·u1 +u2 - |A1ÅA2|}

Then (V,Ind) is a matroid.

Almost partition matroids

More generally

f : 2[k]! Z

A1, A2, …, Akµ V={1,2, …, n}, ui· |Ai|-1;

=<0

f(J)= j 2 J uj +|A(J)|-j 2J|Aj|, 8 J µ [k]

Ind= { I: |I ÅA(J)| · f(J), 8 J µ [k] }

Then (V, Ind) is a matroid (if nonempty).

Rewrite f, f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k]

A generalization of partition matroids

f : 2[k]! Z

More generally

f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k]

Ind= { I: |I ÅA(J)| · f(J), 8 J µ [k] }

Then (V, Ind) is a matroid (if nonempty).

Uncrossing argument

Proof technique:

For a set I, define T(I) to be the set of tight constraints

T(I)= {J µ [k], |I ÅA(J)|=f(J)}

8 I 2Ind, J1, J22 T(I), then (J1 [J22 T(I)) or (J1Å J2 =)

Ind is the family of independent sets of a matroid.

A generalization of almost partition matroids

f : 2[k]! Z,

f(J)=|A(J)|-j 2 J(|Aj| -uj), 8 J µ [k]; ui· |Ai|-1

Note: This requires k· n (for k > n, f becomes negative)

But we want k=n^{log log n}.

Do some sort of truncation to allow k>>n.

f(J) is (¹, ¿) good if f(J) ¸ 0 for J µ [k], |J| ·¿ and

f(J) ¸¹ for J µ [k], ¿·|J| · 2¿ -2

h(J)=f(J) if |J| ·¿ and h(J)=¹, otherwise.

Ind= { I: |I ÅA(J)| · h(J), 8 J µ [k] }

Then (V,Ind) is a matroid (if nonempty).

A generalization of partition matroids

Let L = nlog log n. Let A1, A2, …, AL be random subsets of V.

(Ai -- include each elem of V indep with prob n-2/3. )

Let ¹=n^{1/3} log2 n, u=log2 n, ¿=n1/3

Each subset J µ {1,2, …, L} induces a matroids.t. for any i not in J, Ai is indep in this matroid

• Rank(Ai), i not in J, is roughly |Ai| (i.e., £(n^{1/3})),
• The rank of sets Aj, j in J is u=log2 n.

High=n1/3

X

X

L=nlog log n

X

X

Low=log2n

A1

AL

A2

A3

… … …. ….

Product distributions, Matroid Rank Fns

Talagrand implies:

• Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000,
• [Chekuri, Vondrak ’09] and [Vondrak ’10] prove a slightly more general result by two different techniques
• If E[R]· 500 log(1/²),

Related Work:

Product distributions, Matroid Rank Fns

Talagrand implies:

• Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000,
• Let ¹= i=1m f (xi) / m
• Let g be the constant function with value ¹
• If E[R]· 500 log(1/²),
• Algorithm:
• This achieves approximation factor O(log(1/²)) on a 1-² fraction of points, with high probability.
Conclusions and Open Questions
• Analyze intrinsic learnability of submodular fns
• Our analysis reveals interesting novel extremal and structural properties of submodular fns.
• Open questions
• Improve (n1/3) lower bound to (n1/2)
• Non-monotone submodular functions
Other interesting structural properties
• Let h : R!R+be concave, non-decreasing. For each Sµ V, let f(S) = h(|S|)
• Claim: These functions f are submodular, monotone, non-negative.

V

;

Theorem:Every submodular function looks like this.

Lots of

approximately

usually.

Let f be a non-negative, monotone, submodular, 1-Lipschitz function.For any ²>0, there exists a concave function h : [0,n] !Rs.t.for every k2[0,n], and for a 1-² fraction of SµV with |S|=k,we have:

V

;

• Theorem

h(k) ·f(S) · O(log2(1/²))¢h(k).

In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k.

Proof: Based on Talagrand’s Inequality.

Conclusions and Open Questions
• Analyze intrinsic learnability of submodular fns
• Our analysis reveals interesting novel extremal and structural properties of submodular fns.
• Open questions
• Improve (n1/3) lower bound to (n1/2)
• Non-monotone submodular functions
• Any algorithm?
• Lower bound better than (n1/3)