learning submodular functions n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Learning Submodular Functions PowerPoint Presentation
Download Presentation
Learning Submodular Functions

Loading in 2 Seconds...

play fullscreen
1 / 36

Learning Submodular Functions - PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on

Learning Submodular Functions. Maria Florina Balcan. LGO, 11/16/2010. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A. Submodular functions. V={1,2, …, n}; set-function f : 2 V ! R.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Learning Submodular Functions' - nenet


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning submodular functions

Learning Submodular Functions

Maria FlorinaBalcan

LGO, 11/16/2010

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAAAA

submodular functions
Submodular functions

V={1,2, …, n}; set-function f : 2V! R

  • Concave Functions Let h : R!R be concave. For each S µ V, let f(S) = h(|S|)

f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV

  • Decreasing marginal return

f(T [ {x})-f(T)¸ f(S [ {x})-f(S), 8 S,TµV, Tµ S, x not in T

Examples:

  • Vector Spaces Let V={v1,,vn}, each vi2Fn.For each S µ V, let f(S) = rank(V[S])
submodular set functions

S[T

SÅT

Submodular set functions
  • Set function f on V is called submodular if

For all S,T µ V: f(S)+f(T) ¸ f(S[T)+f(SÅT)

  • Equivalent diminishing returns characterization:

+

¸

+

T

S

+

x

S

Large improvement

Submodularity:

T

+

x

Small improvement

For TµS, xS, f(T [{x}) – f(T) ¸f(S [{x}) – f(S)

example set cover
Example: Set cover

Want to cover floorplan with discs

Place sensorsin building

Possiblelocations V

For S µ V: f(S) = “area (# locations) covered by sensors placed at S”

Node predicts

values of positions

with some radius

Formally:

W finite set, collection of n subsets Wiµ W

For S µ V={1,…,n} define f(S) = |i2S Wi|

set cover is submodular

x

x

Set cover is submodular

T={W1,W2}

W1

W2

f(T[{x})-f(T)

¸

f(S[{x})-f(S)

W1

W2

W3

W4

S = {W1,W2,W3,W4}

submodular functions1
Submodular functions

V={1,2, …, n}; set-function f : 2V! R

  • Concave Functions Let h : R!R be concave. For each S µ V, let f(S) = h(|S|)

f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV

  • Decreasing marginal return

f(T [ {x})-f(T)· f(S [ {x})-f(S), 8 S,TµV, S µ T, x not in T

Examples:

  • Vector Spaces Let V={v1,,vn}, each vi2Fn.For each S µ V, let f(S) = rank(V[S])
submodular functions2
Submodular functions

V={1,2, …, n}; set-function f : 2V! R

f(S)+f(T) ¸ f(S Å T) + f(S [ T), 8 S,TµV

Monotone:

f(S) · f(T) , 8 S µ T

Non-negative:

f(S) ¸ 0, 8 S µ V

submodular functions3
Submodular functions
  • A lot of work on optimization and submodularity.
  • Can be minimized in polynomial time.
  • Algorithmic game theory
  • decreasing marginal utilities.
  • Substantial interest in the ML community recently.
  • Tutorials, workshops at ICML, NIPS, etc.
  • www.submodularity.org/ owned by ML.
learnability of submodular fns
Learnability of Submodular Fns
  • Important to also understand their learnability.
  • Exact learning with value queries
  • Previous Work:

Goemans, Harvey, Iwata, Mirrokni, SODA 2009

  • [GHIM’09] Model
  • There is an unknown submodular target function.
  • Algorithm allowed to (adaptively) pick sets and query the value of the target on those sets.
  • Can we learn the target with a polynomial number of queries in poly time?
  • Output a function that approximates the target within a factor of ® on every single subset.
exact learning with value queries
Exact learning with value queries

Goemans, Harvey, Iwata, Mirrokni, SODA 2009

  • Theorem: (General upperbound)

9 an alg. for learning a submodular function with an approx. factor O(n1/2).

  • Theorem: (General lower bound)
  • Any alg. for learning a submodular must have an approx. factor of (n1/2).
problems with the ghim model
Problems with the GHIM model
  • - Lower bound fails if our goal is to do well on most of the points.
  • - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries
  • - Well known that value queries are undesirable in some learning applications.

Is there a better model that gets around these problems?

problems with the ghim model1
Problems with the GHIM model
  • - Lower bound fails if our goal is to do well on most of the points.
  • - Many simple functions that are easy to learn in the PAC model (e.g., conjunctions) are impossible to get exactly from a poly number of queries
  • - Well known that value queries are undesirable in some learning applications.

Learning submodularfns in a distributional learning setting [BH10]

our model passive supervised learning
Our model: Passive Supervised Learning

Data Source

Distribution D on {0,1}n

Expert / Oracle

Learning Algorithm

Labeled Examples

(x1,f(x1)),…, (xk,f(xk))

f : {0,1}n R+

Alg.outputs

g : {0,1}n R+

our model passive supervised learning1
Our model: Passive Supervised Learning

Data Source

Distribution D on {0,1}n

Expert / Oracle

Learning Algorithm

Labeled Examples

(x1,f(x1)),…, (xk,f(xk))

  • Algorithm sees (x1,f(x1)),…, (xk,f(xk)),xii.i.d. from D

f : {0,1}n R+

  • Algorithm produces “hypothesis” g. (Hopefully g ¼f)
  • Prx1,…,xm[ Prx[g(x)·f(x)·®g(x)]¸1-²] ¸1-±

Alg.outputs

g : {0,1}n R+

  • “Probably MostlyApproximatelyCorrect”
main results
Main results
  • Theorem: (Our general upper bound)

9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2).

  • Note:
  • Much simpler alg. compared to GIHM’09
  • Theorem: (Our general lower bound)
  • No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factorõ(n1/3).
  • Note:
  • The GIHM’09 lower bound fails in our model.
  • Theorem: (Product distributions)

Matroid rank functions, const. approx.

a general upper bound
A General Upper Bound
  • Theorem:

9 an alg. for PMAC-learning the class of non-negative, monotone, submodular fns (w.r.t. an arbitrary distribution) with an approx. factor O(n1/2).

subaddtive fns are approximately linear
Subaddtive Fns are Approximately Linear
  • Let f be non-negative, monotone and subadditive
  • Claim:f can be approximated to within factor nby a linear functiong.
  • Proof Sketch: Let g(S) = s in S f({s}).Then f(S)·g(S) ·n ¢f(S).

Subadditive: f(S)+f(T) ¸ f(S[ T) 8 S,T µ V

Monotonicity: f(S) · f(T) 8 Sµ T

Non-negativity: f(S) ¸ 0 8 S µ V

subaddtive fns are approximately linear1
Subaddtive Fns are Approximately Linear
  • f(S) ·g(S) ·n¢f(S).

n¢f

g

f

V

pmac learning subadditive fns
PMAC Learning Subadditive Fns
  • fnon-negative, monotone,subadditiveapproximated to within factor nby a linear functiong,
  • g (S) =w¢Â (S).
  • Sample S from D; flip a coin.
  • Labeled examples((Â(S), f(S) ), +) and ((Â(S), n¢f(S) ), -) are linearly separable inRn+1.
  • Idea: learn a linear separator. Use std sample complex.
  • Problem: data noti.i.d.
  • Solution: create a related distribution.
  • If heads add ((Â(S), f(S) ), +).
  • Else add ((Â(S), n¢f(S) ), -).
pmac learning subadditive fns1
PMAC Learning Subadditive Fns
  • Algorithm:
  • Note:
  • Deal with the set {S:f(S)=0 } separately.

Input: (S1, f(S1)) …, (Sm, f(Sm))

  • For each Si, flip a coin.
  • If heads add ((Â(S), f(Si) ), +).
  • Else add ((Â(S), n¢f(Si) ), -).
  • Learn a linear separator u=(w,-z) in Rn+1.
  • Output:g(S)=1/(n+1) w ¢Â (S).
  • Theorem: For m = £(n/²), g approximates f to within a factor n on a 1-² fraction of the distribution.
pmac learning submodular fns
PMAC Learning Submodular Fns
  • Algorithm:
  • Note:
  • Deal with the set {S:f(S)=0 } separately.

Input: (S1, f(S1)) …, (Sm, f(Sm))

  • For each Si, flip a coin.
  • If heads add ((Â(S), f2(S_i)) ), +).
  • Else add ((Â(S), n f2(S_i) ), -).
  • Learn a linear separator u=(w,-z) in Rn+1.
  • Output:g(S)=1/(n+1)1/2 w ¢Â (S)
  • Theorem: For m = £(n/²), g approximates f to within a factor \sqrt{n} on a 1-² fraction of the distribution.

Proof idea: f non-negative, monotone, submodular approximated to within factor \sqrt{n} by a \sqrt{linear function}.

[GHIM, 09]

pmac learning submodular fns1
PMAC Learning Submodular Fns
  • Algorithm:
  • Note:
  • Deal with the set {S:f(S)=0 } separately.

Input: (S1, f(S1)) …, (Sm, f(Sm))

  • For each Si, flip a coin.
  • If heads add ((Â(S), f2(S_i)) ), +).
  • Else add ((Â(S), n f2(S_i) ), -).
  • Learn a linear separator u=(w,-z) in Rn+1.
  • Output:g(S)=1/(n+1)1/2 w ¢Â (S)
  • Much simpler than [GIHM09]. More robust to variations.
  • the target only needs to be within an ¯ factor of a submodularfnc.
  • 9 a submodularfnc that agrees with target on all but a ´ fraction of the points (on the points it disagrees it can be arbitrarily far).
  • [the alg is inefficient in this case]
a general lower bound
A General Lower Bound
  • Theorem: (Our general lower bound)
  • No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factorõ(n1/3).

Plan:

Use the fact that any matroid rank fnc is submodular.

Construct a hard family of matroid rank functions.

High=n1/3

X

X

L=nlog log n

X

X

Low=log2n

A1

AL

A2

A3

… … …. ….

partition matroids
Partition Matroids

A1, A2, …, Akµ V={1,2, …, n}, all disjoint;ui· |Ai|-1

  • E.g., n=5, A1={1,2,3}, A2={3,4,5}, u1=u2=2.

Ind={I: |I ÅAj| ·uj, for all j }

Then (V, Ind) is a matroid.

If sets Ai are not disjoint, then (V,Ind) might not be a matroid.

  • {1,2,4,5} and {2,3,4} both maximal sets in Ind; do not have the same cardinality.
almost partition matroids
Almost partition matroids

k=2, A1, A2µ V (not necessarily disjoint); ui· |Ai|-1

Ind={I: |I ÅAj| ·uj , |I Å (A1[A2)| ·u1 +u2 - |A1ÅA2|}

Then (V,Ind) is a matroid.

almost partition matroids1
Almost partition matroids

More generally

f : 2[k]! Z

A1, A2, …, Akµ V={1,2, …, n}, ui· |Ai|-1;

=<0

f(J)= j 2 J uj +|A(J)|-j 2J|Aj|, 8 J µ [k]

Ind= { I: |I ÅA(J)| · f(J), 8 J µ [k] }

Then (V, Ind) is a matroid (if nonempty).

Rewrite f, f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k]

a generalization of partition matroids
A generalization of partition matroids

f : 2[k]! Z

More generally

f(J)=|A(J)|-j 2 J(|Aj| - uj), 8 J µ [k]

Ind= { I: |I ÅA(J)| · f(J), 8 J µ [k] }

Then (V, Ind) is a matroid (if nonempty).

Uncrossing argument

Proof technique:

For a set I, define T(I) to be the set of tight constraints

T(I)= {J µ [k], |I ÅA(J)|=f(J)}

8 I 2Ind, J1, J22 T(I), then (J1 [J22 T(I)) or (J1Å J2 =)

Ind is the family of independent sets of a matroid.

a generalization of almost partition matroids
A generalization of almost partition matroids

f : 2[k]! Z,

f(J)=|A(J)|-j 2 J(|Aj| -uj), 8 J µ [k]; ui· |Ai|-1

Note: This requires k· n (for k > n, f becomes negative)

But we want k=n^{log log n}.

Do some sort of truncation to allow k>>n.

f(J) is (¹, ¿) good if f(J) ¸ 0 for J µ [k], |J| ·¿ and

f(J) ¸¹ for J µ [k], ¿·|J| · 2¿ -2

h(J)=f(J) if |J| ·¿ and h(J)=¹, otherwise.

Ind= { I: |I ÅA(J)| · h(J), 8 J µ [k] }

Then (V,Ind) is a matroid (if nonempty).

a generalization of partition matroids1
A generalization of partition matroids

Let L = nlog log n. Let A1, A2, …, AL be random subsets of V.

(Ai -- include each elem of V indep with prob n-2/3. )

Let ¹=n^{1/3} log2 n, u=log2 n, ¿=n1/3

Each subset J µ {1,2, …, L} induces a matroids.t. for any i not in J, Ai is indep in this matroid

  • Rank(Ai), i not in J, is roughly |Ai| (i.e., £(n^{1/3})),
  • The rank of sets Aj, j in J is u=log2 n.

High=n1/3

X

X

L=nlog log n

X

X

Low=log2n

A1

AL

A2

A3

… … …. ….

product distributions matroid rank fns
Product distributions, Matroid Rank Fns

Talagrand implies:

  • Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000,
  • [Chekuri, Vondrak ’09] and [Vondrak ’10] prove a slightly more general result by two different techniques
  • If E[R]· 500 log(1/²),

Related Work:

product distributions matroid rank fns1
Product distributions, Matroid Rank Fns

Talagrand implies:

  • Let D be a product distribution on V, R=rank(X), X drawn from D. If E[R] ¸ 4000,
  • Let ¹= i=1m f (xi) / m
  • Let g be the constant function with value ¹
  • If E[R]· 500 log(1/²),
  • Algorithm:
  • This achieves approximation factor O(log(1/²)) on a 1-² fraction of points, with high probability.
conclusions and open questions
Conclusions and Open Questions
  • Analyze intrinsic learnability of submodular fns
  • Our analysis reveals interesting novel extremal and structural properties of submodular fns.
  • Open questions
  • Improve (n1/3) lower bound to (n1/2)
  • Non-monotone submodular functions
other interesting structural properties
Other interesting structural properties
  • Let h : R!R+be concave, non-decreasing. For each Sµ V, let f(S) = h(|S|)
  • Claim: These functions f are submodular, monotone, non-negative.

V

;

slide35

Theorem:Every submodular function looks like this.

Lots of

approximately

usually.

Let f be a non-negative, monotone, submodular, 1-Lipschitz function.For any ²>0, there exists a concave function h : [0,n] !Rs.t.for every k2[0,n], and for a 1-² fraction of SµV with |S|=k,we have:

V

;

  • Theorem

h(k) ·f(S) · O(log2(1/²))¢h(k).

In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k.

Proof: Based on Talagrand’s Inequality.

conclusions and open questions1
Conclusions and Open Questions
  • Analyze intrinsic learnability of submodular fns
  • Our analysis reveals interesting novel extremal and structural properties of submodular fns.
  • Open questions
  • Improve (n1/3) lower bound to (n1/2)
  • Non-monotone submodular functions
  • Any algorithm?
  • Lower bound better than (n1/3)