- By
**nenet** - Follow User

- 116 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Learning Submodular Functions' - nenet

Download Now**An Image/Link below is provided (as is) to download presentation**

Download Now

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Learning Submodular Functions

Maria FlorinaBalcan

LGO, 11/16/2010

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAAAA

Submodular functions

V={1,2, …, n}; set-function f : 2V! R

- Concave Functions Let h : R!R be concave. For each S µ V, let f(S) = h(|S|)

f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV

- Decreasing marginal return

f(T [ {x})-f(T)¸ f(S [ {x})-f(S), 8 S,TµV, Tµ S, x not in T

Examples:

- Vector Spaces Let V={v1,,vn}, each vi2Fn.For each S µ V, let f(S) = rank(V[S])

SÅT

Submodular set functions- Set function f on V is called submodular if

For all S,T µ V: f(S)+f(T) ¸ f(S[T)+f(SÅT)

- Equivalent diminishing returns characterization:

+

¸

+

T

S

+

x

S

Large improvement

Submodularity:

T

+

x

Small improvement

For TµS, xS, f(T [{x}) – f(T) ¸f(S [{x}) – f(S)

Example: Set cover

Want to cover floorplan with discs

Place sensorsin building

Possiblelocations V

For S µ V: f(S) = “area (# locations) covered by sensors placed at S”

Node predicts

values of positions

with some radius

Formally:

W finite set, collection of n subsets Wiµ W

For S µ V={1,…,n} define f(S) = |i2S Wi|

Submodular functions

V={1,2, …, n}; set-function f : 2V! R

- Concave Functions Let h : R!R be concave. For each S µ V, let f(S) = h(|S|)

f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV

- Decreasing marginal return

f(T [ {x})-f(T)· f(S [ {x})-f(S), 8 S,TµV, S µ T, x not in T

Examples:

- Vector Spaces Let V={v1,,vn}, each vi2Fn.For each S µ V, let f(S) = rank(V[S])

Submodular functions

V={1,2, …, n}; set-function f : 2V! R

f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV

Monotone:

f(S) · f(T) , 8 S µ T

Non-negative:

f(S) ¸ 0, 8 S µ V

Submodular functions

- A lot of work on optimization and submodularity.

- Can be minimized in polynomial time.

- Algorithmic game theory

- decreasing marginal utilities.

- Substantial interest in the ML community recently.

- Tutorials, workshops at ICML, NIPS, etc.

- www.submodularity.org/ owned by ML.

Learnability of Submodular Fns

- Important to also understand their learnability.

- Exact learning with value queries

- Previous Work:

Goemans, Harvey, Iwata, Mirrokni, SODA 2009

- [GHIM’09] Model

- There is an unknown submodular target function.

- Algorithm allowed to (adaptively) pick sets and query the value of the target on those sets.

- Can we learn the target with a polynomial number of queries in poly time?

- Output a function that approximates the target within a factor of ® on every single subset.

Exact learning with value queries

Goemans, Harvey, Iwata, Mirrokni, SODA 2009

- Theorem: (General upperbound)

9 an alg. for learning a submodular function with an approx. factor O(n1/2).

- Theorem: (General lower bound)

- Any alg. for learning a submodular must have an approx. factor of (n1/2).

Problems with the GHIM model

- - Lower bound fails if our goal is to do well on most of the points.

- - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries

- - Well known that value queries are undesirable in some learning applications.

Is there a better model that gets around these problems?

Problems with the GHIM model

- - Lower bound fails if our goal is to do well on most of the points.

- - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries

- - Well known that value queries are undesirable in some learning applications.

Learning submodularfns in a distributional learning setting [BH10]

Our model: Passive Supervised Learning

Data Source

Distribution D on {0,1}n

Expert / Oracle

Learning Algorithm

Labeled Examples

(x1,f(x1)),…, (xk,f(xk))

f : {0,1}n R+

Alg.outputs

g : {0,1}n R+

Our model: Passive Supervised Learning

Data Source

Distribution D on {0,1}n

Expert / Oracle

Learning Algorithm

Labeled Examples

(x1,f(x1)),…, (xk,f(xk))

- Algorithm sees (x1,f(x1)),…, (xk,f(xk)),xii.i.d. from D

f : {0,1}n R+

- Algorithm produces “hypothesis” g. (Hopefully g ¼f)
- Prx1,…,xm[ Prx[g(x)·f(x)·®g(x)]¸1-²] ¸1-±

Alg.outputs

g : {0,1}n R+

- “Probably MostlyApproximatelyCorrect”

Main results

- Theorem: (Our general upper bound)

9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2).

- Note:

- Much simpler alg. compared to GIHM’09

- Theorem: (Our general lower bound)

- No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factorõ(n1/3).

- Note:

- The GIHM’09 lower bound fails in our model.

- Theorem: (Product distributions)

Matroid rank functions, const. approx.

A General Upper Bound

- Theorem:

9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2).

Subaddtive Fns are Approximately Linear

- Let f be non-negative, monotone and subadditive
- Claim:f can be approximated to within factor nby a linear functiong.

- Proof Sketch: Let g(S) = s in S f({s}).Then f(S)·g(S) ·n ¢f(S).

Subadditive: f(S)+f(T) ¸ f(S[ T) 8 S,T µ V

Monotonicity: f(S) · f(T) 8 Sµ T

Non-negativity: f(S) ¸ 0 8 S µ V

PMAC Learning Subadditive Fns

- fnon-negative, monotone,subadditiveapproximated to within factor nby a linear functiong,

- g (S) =w¢Â (S).

- Sample S from D; flip a coin.

- Labeled examples((Â(S), f(S) ), +) and ((Â(S), n¢f(S) ), -) are linearly separable inRn+1.

- Idea: learn a linear separator. Use std sample complex.

- Problem: data noti.i.d.

- Solution: create a related distribution.

- If heads add ((Â(S), f(S) ), +).

- Else add ((Â(S), n¢f(S) ), -).

PMAC Learning Subadditive Fns

- Algorithm:

- Note:

- Deal with the set {S:f(S)=0 } separately.

Input: (S1, f(S1)) …, (Sm, f(Sm))

- For each Si, flip a coin.

- If heads add ((Â(S), f(Si) ), +).

- Else add ((Â(S), n¢f(Si) ), -).

- Learn a linear separator u=(w,-z) in Rn+1.

- Output:g(S)=1/(n+1) w ¢Â (S).

- Theorem: For m = £(n/²), g approximates f to within a factor n on a 1-² fraction of the distribution.

PMAC Learning Submodular Fns

- Algorithm:

- Note:

- Deal with the set {S:f(S)=0 } separately.

Input: (S1, f(S1)) …, (Sm, f(Sm))

- For each Si, flip a coin.

- If heads add ((Â(S), f2(S_i)) ), +).

- Else add ((Â(S), n f2(S_i) ), -).

- Learn a linear separator u=(w,-z) in Rn+1.

- Output:g(S)=1/(n+1)1/2 w ¢Â (S)

- Theorem: For m = £(n/²), g approximates f to within a factor \sqrt{n} on a 1-² fraction of the distribution.

Proof idea: f non-negative, monotone, submodular approximated to within factor \sqrt{n} by a \sqrt{linear function}.

[GHIM, 09]

PMAC Learning Submodular Fns

- Algorithm:

- Note:

- Deal with the set {S:f(S)=0 } separately.

Input: (S1, f(S1)) …, (Sm, f(Sm))

- For each Si, flip a coin.

- If heads add ((Â(S), f2(S_i)) ), +).

- Else add ((Â(S), n f2(S_i) ), -).

- Learn a linear separator u=(w,-z) in Rn+1.

- Output:g(S)=1/(n+1)1/2 w ¢Â (S)

- Much simpler than [GIHM09]. More robust to variations.

- the target only needs to be within an ¯ factor of a submodularfnc.

- 9 a submodularfnc that agrees with target on all but a ´ fraction of the points (on the points it disagrees it can be arbitrarily far).

- [the alg is inefficient in this case]

A General Lower Bound

- Theorem: (Our general lower bound)

- No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factorõ(n1/3).

Plan:

Use the fact that any matroid rank fnc is submodular.

Construct a hard family of matroid rank functions.

High=n1/3

X

X

L=nlog log n

X

X

Low=log2n

A1

AL

A2

A3

… … …. ….

Partition Matroids

A1, A2, …, Akµ V={1,2, …, n}, all disjoint;ui· |Ai|-1

- E.g., n=5, A1={1,2,3}, A2={3,4,5}, u1=u2=2.

Ind={I: |I ÅAj| ·uj, for all j }

Then (V, Ind) is a matroid.

If sets Ai are not disjoint, then (V,Ind) might not be a matroid.

- {1,2,4,5} and {2,3,4} both maximal sets in Ind; do not have the same cardinality.

Almost partition matroids

k=2, A1, A2µ V (not necessarily disjoint); ui· |Ai|-1

Ind={I: |I ÅAj| ·uj , |I Å (A1[A2)| ·u1 +u2 - |A1ÅA2|}

Then (V,Ind) is a matroid.

Almost partition matroids

More generally

f : 2[k]! Z

A1, A2, …, Akµ V={1,2, …, n}, ui· |Ai|-1;

=<0

f(J)= j 2 J uj +|A(J)|-j 2J|Aj|, 8 J µ [k]

Ind= { I: |I ÅA(J)| · f(J), 8 J µ [k] }

Then (V, Ind) is a matroid (if nonempty).

Rewrite f, f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k]

A generalization of partition matroids

f : 2[k]! Z

More generally

f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k]

Ind= { I: |I ÅA(J)| · f(J), 8 J µ [k] }

Then (V, Ind) is a matroid (if nonempty).

Uncrossing argument

Proof technique:

For a set I, define T(I) to be the set of tight constraints

T(I)= {J µ [k], |I ÅA(J)|=f(J)}

8 I 2Ind, J1, J22 T(I), then (J1 [J22 T(I)) or (J1Å J2 =)

Ind is the family of independent sets of a matroid.

A generalization of almost partition matroids

f : 2[k]! Z,

f(J)=|A(J)|-j 2 J(|Aj| -uj), 8 J µ [k]; ui· |Ai|-1

Note: This requires k· n (for k > n, f becomes negative)

But we want k=n^{log log n}.

Do some sort of truncation to allow k>>n.

f(J) is (¹, ¿) good if f(J) ¸ 0 for J µ [k], |J| ·¿ and

f(J) ¸¹ for J µ [k], ¿·|J| · 2¿ -2

h(J)=f(J) if |J| ·¿ and h(J)=¹, otherwise.

Ind= { I: |I ÅA(J)| · h(J), 8 J µ [k] }

Then (V,Ind) is a matroid (if nonempty).

A generalization of partition matroids

Let L = nlog log n. Let A1, A2, …, AL be random subsets of V.

(Ai -- include each elem of V indep with prob n-2/3. )

Let ¹=n^{1/3} log2 n, u=log2 n, ¿=n1/3

Each subset J µ {1,2, …, L} induces a matroids.t. for any i not in J, Ai is indep in this matroid

- Rank(Ai), i not in J, is roughly |Ai| (i.e., £(n^{1/3})),

- The rank of sets Aj, j in J is u=log2 n.

High=n1/3

X

X

L=nlog log n

X

X

Low=log2n

A1

AL

A2

A3

… … …. ….

Product distributions, Matroid Rank Fns

Talagrand implies:

- Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000,

- [Chekuri, Vondrak ’09] and [Vondrak ’10] prove a slightly more general result by two different techniques

- If E[R]· 500 log(1/²),

Related Work:

Product distributions, Matroid Rank Fns

Talagrand implies:

- Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000,

- Let ¹= i=1m f (xi) / m
- Let g be the constant function with value ¹

- If E[R]· 500 log(1/²),

- Algorithm:

- This achieves approximation factor O(log(1/²)) on a 1-² fraction of points, with high probability.

Conclusions and Open Questions

- Analyze intrinsic learnability of submodular fns

- Our analysis reveals interesting novel extremal and structural properties of submodular fns.

- Open questions

- Improve (n1/3) lower bound to (n1/2)

- Non-monotone submodular functions

Other interesting structural properties

- Let h : R!R+be concave, non-decreasing. For each Sµ V, let f(S) = h(|S|)

- Claim: These functions f are submodular, monotone, non-negative.

V

;

Theorem:Every submodular function looks like this.

Lots of

approximately

usually.

Let f be a non-negative, monotone, submodular, 1-Lipschitz function.For any ²>0, there exists a concave function h : [0,n] !Rs.t.for every k2[0,n], and for a 1-² fraction of SµV with |S|=k,we have:

V

;

- Theorem

h(k) ·f(S) · O(log2(1/²))¢h(k).

In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k.

Proof: Based on Talagrand’s Inequality.

Conclusions and Open Questions

- Analyze intrinsic learnability of submodular fns

- Our analysis reveals interesting novel extremal and structural properties of submodular fns.

- Open questions

- Improve (n1/3) lower bound to (n1/2)

- Non-monotone submodular functions

- Any algorithm?
- Lower bound better than (n1/3)

Download Presentation

Connecting to Server..