New theoretical frameworks for machine learning l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 54

New Theoretical Frameworks for Machine Learning PowerPoint PPT Presentation


  • 144 Views
  • Updated On :
  • Presentation posted in: General

New Theoretical Frameworks for Machine Learning. Maria-Florina Balcan. Thesis Proposal. 05/15/2007. Thanks to My Committee. Avrim Blum. Manuel Blum. Tom Mitchell. Yishay Mansour. Santosh Vempala. The Goal of the Thesis. New Theoretical Frameworks for Modern Machine Learning Paradigms.

Download Presentation

New Theoretical Frameworks for Machine Learning

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


New theoretical frameworks for machine learning l.jpg

New Theoretical Frameworks for Machine Learning

Maria-Florina Balcan

Thesis Proposal

05/15/2007


Thanks to my committee l.jpg

Thanks to My Committee

Avrim Blum

Manuel Blum

Tom Mitchell

Yishay Mansour

Santosh Vempala


The goal of the thesis l.jpg

The Goal of the Thesis

New Theoretical Frameworks for Modern Machine Learning Paradigms

Connections between Machine Learning Theory and Algorithmic Game Theory


Slide4 l.jpg

New Frameworks for Modern Learning Paradigms

Modern Learning Paradigms

Incorporating UnlabeledData in the Learning Process

Kernel based Learning

Qualitative gapbetween theory and practice

Semi-supervised Learning

Unified theoretical

treatment is lacking

Active Learning

Our Contributions

Our Contributions

Semi-supervised learning

A theory of learning with general similarity functions

- a unified PAC framework

Active Learning

Extensions to clustering

- new positive theoretical results

With Avrim and Santosh


Slide5 l.jpg

New Frameworks for Modern Learning Paradigms

Modern Learning Paradigms

Incorporating UnlabeledData in the Learning Process

Kernel, Similarity based Learning and Clustering

Qualitative gapbetween theory and practice

Unified theoretical

treatment is lacking

Our Contributions

Our Contributions

Semi-supervised learning

A theory of learning with general similarity functions

- a unified PAC framework

Active Learning

Extensions to clustering

- new positive theoretical results

With Avrim and Santosh


Slide6 l.jpg

Machine Learning Theory and Algorithmic Game Theory

Brief Overview of Our Results

Mechanism Design, ML, and Pricing Problems

Generic Framework for reducing problems of incentive-compatible mechanism design to standard algorithmic questions.

[Balcan-Blum-Hartline-Mansour, FOCS 2005, JCSS 2007]

Approximation Algorithms for Item Pricing.

[Balcan-Blum, EC 2006]

  • Revenue maximizationin comb. auctions with single-minded consumers


The goal of the thesis7 l.jpg

The Goal of the Thesis

New Theoretical Frameworks for Modern Machine Learning Paradigms

  • Semi-Supervised and Active Learning

  • Similarity Based Learning and Clustering

Connections between Machine Learning Theory and Algorithmic Game Theory

  • Use MLT techniques for designing and analyzing auctions in the context of Revenue Maximization


The goal of the thesis8 l.jpg

The Goal of the Thesis

New Theoretical Frameworks for Modern Machine Learning Paradigms

Incorporating UnlabeledData in the Learning Process

Kernel, Similarity based learning and Clustering

Semi-supervised learning (SSL)

- Connections between kernels,

margins and feature selection

- An Augmented PAC model for SSL

[Balcan-Blum, COLT 2005; book chapter, “Semi-Supervised Learning”, 2006]

[Balcan-Blum-Vempala, MLJ 2006]

- A general theory of learning with similarity functions

Active Learning (AL)

- Generic agnostic AL procedure

[Balcan-Blum, ICML 2006]

[Balcan-Beygelzimer-Langford, ICML 2006]

- Extensions to Clustering

- Margin based AL of linear separators

[Balcan-Blum-Vempala, work in progress]

[Balcan-Broder-Zhang, COLT 2007]


The goal of the thesis9 l.jpg

The Goal of the Thesis

New Theoretical Frameworks for Modern Machine Learning Paradigms

Incorporating UnlabeledData in the Learning Process

Kernel, Similarity based learning and Clustering

Semi-supervised learning (SSL)

- Connections between kernels,

margins and feature selection

- An Augmented PAC model for SSL

[Balcan-Blum, COLT 2005; book chapter, “Semi-Supervised Learning”, 2006]

[Balcan-Blum-Vempala, MLJ 2006]

- A general theory of learning with similarity functions

Active Learning (AL)

- Generic agnostic AL procedure

[Balcan-Blum, ICML 2006]

[Balcan-Beygelzimer-Langford, ICML 2006]

- Extensions to Clustering

- Margin based AL of linear separators

[Balcan-Blum-Vempala, work in progress]

[Balcan-Broder-Zhang, COLT 2007]


Slide10 l.jpg

Part I, Incorporating Unlabeled Data in

the Learning Process

Semi-Supervised Learning

A unified PAC-style framework

[Balcan-Blum, COLT 2005; book chapter, “Semi-Supervised Learning”, 2006]


Standard supervised learning setting l.jpg

Standard Supervised Learning Setting

  • X – instance/feature space

  • S={(x, l)} - set of labeled examples

    • labeled examples - assumed to be drawn i.i.d. from some distr. D over X and labeled by some target concept c*2 C

    • labels 2{-1,1} - binary classification

  • Want to do optimization over S to find some hypothesis h, but we wanth to have small error over D.

  • err(h)=Prx 2 D(h(x)  c*(x))

  • Classic models for learning from labeled data.

  • Statistical Learning Theory (Vapnik)

  • PAC (Valiant)


Slide12 l.jpg

Standard Supervised Learning Setting

Sample Complexity

  • E.g., Finite Hypothesis Spaces, Realizable Case

  • In PAC, can also talk about efficient algorithms.


Semi supervised learning l.jpg

Semi-Supervised Learning

Hot topic in recent years in Machine Learning.

  • Several methods have been developed to try to use unlabeled data to improve performance, e.g.:

    • Transductive SVM[Joachims ’98]

    • Co-training[Blum & Mitchell ’98], [Balcan-Blum-Yang’04]

    • Graph-based methods[Blum & Chawla01], [ZGL03]

Scattered Theoretical Results…


An augmented pac model for ssl bb05 l.jpg

An Augmented PAC model for SSL [BB05]

Extends PAC naturally to fit SSL.

Can generically analyze:

  • When will unlabeled data help and by how much.

  • How muchdata should I expect to need to perform well.

Key Insight

Unlabeled data is useful if we have beliefs not only about the form of the target, but also about its relationship with the underlying distribution.

Different algorithms are based on different assumptions about how data should behave.

Challenge – how to capture many of the assumptions typically used.


Example of typical assumption margins l.jpg

_

+

_

_

+

+

+

_

+

+

_

_

SVM

Transductive SVM

Labeled data only

Example of “typical” assumption: Margins

The separator goes throughlowdensity regions of the space/large margin.

  • assume we are looking for linear separator

  • belief: should exist one withlargeseparation


Another example self consistency l.jpg

Prof. Avrim Blum

My Advisor

Prof. Avrim Blum

My Advisor

x - Link info & Text info

x2- Link info

x1- Text info

Another Example: Self-consistency

Agreement between two parts : co-training [BM98].

- examples contain twosufficient sets of features, x = hx1, x2i

- thebeliefis that the two parts of the example are consistent, i.e. 9 c1, c2 such that c1(x1)=c2(x2)=c*(x)

For example, if we want to classify web pages:

x = hx1, x2i


Problems thinking about ssl in the pac model l.jpg

Problems thinking about SSL in the PAC model

Su={xi} -unlabeledexamples drawn i.i.d. from D

Sl={(xi, yi)} – labeled examples drawn i.i.d. from D and labeled by some target concept c*.

PAC model talks of learning a class C under (known or unknown) distribution D.

  • Not clear what unlabeled data can do for you.

  • Doesn’t give you any info about which c 2 C is the target function.

We extend the PAC model to capture these (and more) uses of unlabeled data.

  • Give aunified frameworkfor understanding when and why unlabeled data can help.


Proposed model main idea 1 l.jpg

_

+

+

_

Proposed Model, Main Idea (1)

Augment the notion of a concept classC with a notion of compatibilitybetween a concept and the data distribution.

“learn C” becomes “learn (C,)” (i.e. learn class C under compatibility notion )

Express relationships that one hopes the target function and underlying distribution will possess.

Idea: use unlabeled data & the belief that the target is compatible to reduce C down to just {the highly compatible functions in C}.


Proposed model main idea 2 l.jpg

Proposed Model, Main Idea (2)

Idea: use unlabeled data & our belief toreduce size(C) down to size(highly compatible functions in C) in our sample complexity bounds.

Need to be able to analyze how much unlabeled data is needed to uniformly estimate compatibilities well.

Require that the degree of compatibility be something that can be estimated from a finite sample.

  • Require  to be an expectation over individual examples:

    • (h,D)=Ex2 D[(h, x)]compatibility of h with D, (h,x)2 [0,1]

    • errunl(h)=1-(h, D) incompatibility of h with D (unlabeled error rate of h)


Margins compatibility l.jpg

_

+

Highly compatible

+

_

Margins, Compatibility

Margins: belief is that should exist a large margin separator.

Incompatibility of h and D (unlabeled error rate of h) – the probability mass within distance  of h.

Can be written as an expectation over individual examples(h,D)=Ex 2 D[(h,x)] where:(h,x)=0 if dist(x,h) ·(h,x)=1 if dist(x,h) ¸


Margins compatibility21 l.jpg

_

+

Highly compatible

+

_

Margins, Compatibility

Margins: belief is that should exist a large margin separator.

If do not want to commit to in advance, define (h,x) to be a smooth function of dist(x,h), e.g.:

Illegal notion of compatibility: the largest s.t. D has

probability mass exactly zero within distance  of h.


Co training compatibility l.jpg

Co-Training, Compatibility

Co-training: examples come as pairs hx1, x2i and the goal is to learn a pair of functionshh1,h2i.

Hope is that the two parts of the example are consistent.

Legal (and natural)notion of compatibility:

- the compatibility of hh1,h2iand D:

- can be written as an expectation over examples:


Types of results in the bb05 model l.jpg

Types of Results in the [BB05] Model

As in PAC, can discuss algorithmic and sample complexity issues.

Sample Complexity issues that we can address:

  • How much unlabeled data we need:

  • depends both on the complexity of C and the on the

  • complexity of our notion of compatibility.

- Ability of unlabeled data to reduce # of labeled examples needed:

  • compatibility of the target

  • (various) measures of the helpfulness of the distribution

  • Give both uniform convergence bounds and epsilon-cover based bounds.


Examples of results sample complexity uniform convergence bounds l.jpg

Examples of results:Sample Complexity, Uniform Convergence Bounds

Finite Hypothesis Spaces, Doubly Realizable Case

ALG: pick a compatible concept that agrees with the labeled sample.

CD,() = {h 2 C :errunl(h) ·}

Bound the # of labeled examples as a measure of the helpfulness of D with respect to 

  • helpful D is one in which CD, () is small


Examples of results sample complexity uniform convergence bounds25 l.jpg

+

_

Highly compatible

+

_

Examples of results:Sample Complexity, Uniform Convergence Bounds

Finite Hypothesis Spaces, Doubly Realizable Case

ALG: pick a compatible concept that agrees with the labeled sample.

CD,() = {h 2 C :errunl(h) ·}


Sample complexity subtleties l.jpg

+

_

Highly compatible

+

_

Sample Complexity Subtleties

Uniform Convergence Bounds

Depends both on the complexity of C and on the complexity of 

Distr. dependent measure of complexity

-Cover boundsmuch better than Uniform Convergence bounds.

  • For algorithms that behave in a specific way:

    • first use the unlabeled data to choose a representative set of compatible hypotheses

    • then use the labeled sample to choose among these


Sample complexity implications of our analysis l.jpg

Sample Complexity Implications of Our Analysis

Ways in which unlabeled data can help

  • If c* is highly compatible and have enough unlabeled data, then can reduce the search space (from C down to just those h 2 C whose estimated unlabeled error rate is low).

  • By providing an estimate of D, unlabeled data can allow a more refined distribution-specific notion of hypothesis space size (e.g. the size of the smallest -cover).

Subsequent Work, E.g.:

P. Bartlett, D. Rosenberg, AISTATS 2007

J. Shawe-Taylor et al., Neurocomputing 2007


Efficient co training of linear separators l.jpg

Efficient Co-training of linear separators

  • Assume independence given the label

    • both points from D+ or from D-.

  • [Blum & Mitchell] show can co-train (in polynomial time) if have enoughlabeled data to produce a weakly-usefulhypothesis to begin with.

  • [BB05] shows we can learn (in polynomial time) with only a single labeled example.

  • Key point: independence given the label implies that the functions with low errunl rate are:

    • close to c*

    • close to : c*

    • close to the all positive function

    • close to the all negative function

Idea: use unlabeled data to generate poly # of candidate hyps s.t. at least one is weakly-useful (uses Outlier Removal Lemma). Plug into [BM98].


Slide29 l.jpg

Modern Learning Paradigms: Our Contributions

Modern Learning Paradigms

Incorporating Unlabeled Data in the Learning Process

Kernel, Similarity based learning and Clustering

Semi-supervised learning (SSL)

- Connections between kernels,

margins and feature selection

- An Augmented PAC model for SSL

[Balcan-Blum-Vempala, MLJ 2006]

[Balcan-Blum, COLT 2005]

[Balcan-Blum, book chapter,

“Semi-Supervised Learning”, 2006]

- A general theory of learning with similarity functions

Active Learning (AL)

[Balcan-Blum, ICML 2006]

- Generic agnostic AL procedure

- Extensions to Clustering

[Balcan-Beygelzimer-Langford, ICML 2006]

[Balcan-Blum-Vempala, work in progress]

- Margin based AL of linear separators

[Balcan-Broder-Zhang, COLT 2007]


Slide30 l.jpg

Modern Learning Paradigms: Our Contributions

Modern Learning Paradigms

Incorporating Unlabeled Data in the Learning Process

Kernel, Similarity based learning and Clustering

Semi-supervised learning (SSL)

- Connections between kernels,

margins and feature selection

- An Augmented PAC model for SSL

[Balcan-Blum-Vempala, MLJ 2006]

[Balcan-Blum, COLT 2005]

[Balcan-Blum, book chapter,

“Semi-Supervised Learning”, 2006]

- A general theory of learning with similarity functions

Active Learning (AL)

[Balcan-Blum, ICML 2006]

- Generic agnostic AL procedure

- Extensions to Clustering

[Balcan-Beygelzimer-Langford, ICML 2006]

[Balcan-Blum-Vempala, work in progress]

- Margin based AL of linear separators

[Balcan-Broder-Zhang, COLT 2007]


Slide31 l.jpg

Part II, Similarity Functions

for Learning

[Balcan-Blum, ICML 2006]

Extensions to Clustering

(With Avrim and Santosh, work in progress)


Kernels and similarity functions l.jpg

Kernels and Similarity Functions

Kernels have become a powerful tool in ML.

  • Useful in practice for dealing with many different kinds of data.

  • Elegant theory about what makes a given kernel good for a given learning problem.

Our Work: analyze more general similarity functions.

  • In the process we describe ways of constructing good data dependent kernels.


Kernels l.jpg

(x)

1

w

Kernels

  • A kernel K is a pairwise similarity function s.t. 9 an implicit mapping  s.t. K(x,y)=(x) ¢(y).

  • Point is: many learning algorithms can be written so only interact with data via dot-products.

  • If replace x¢y with K(x,y), it acts implicitly as if data was in higher-dimensional -space.

  • If data is linearly separable by large margin in -space, don’t have to pay in terms of data or comp time.

If margin  in -space, only need 1/2 examples to learn well.


General similarity functions l.jpg

General Similarity Functions

We provide:characterization ofgood similarity functionsfor a learning problem that:

1) Talks in terms of natural direct properties:

  • no implicit high-dimensional spaces

  • no requirement of positive-semidefiniteness

2) If K satisfies these properties for our given problem, then has implications to learning.

3) Is broad: includes usual notion of “good kernel”.

(induces a large margin separator in -space)


A first attempt definition satisfying properties 1 and 2 l.jpg

-

B

C

-

A

+

A First Attempt: Definition satisfying properties (1) and (2)

Let P be a distribution over labeled examples (x, l(x))

  • K:(x,y) ! [-1,1] is an (,)-good similarity for P if at leasta 1-probability mass of x satisfy:

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+

  • Suppose that positives have K(x,y) ¸ 0.2, negatives have K(x,y) ¸ 0.2, but for a positive and a negative K(x,y) are uniform random in [-1,1].

Note: this might not be a legal kernel.


A first attempt definition satisfying properties 1 and 2 how to use it l.jpg

A First Attempt: Definition satisfying properties (1) and (2). How to use it?

  • K:(x,y) ! [-1,1] is an(,)-good similarityfor P if at leasta 1-probability mass of x satisfy:

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+

Algorithm

  • Draw S+ of O((1/2) ln(1/2)) positive examples.

  • Draw S- of O((1/2) ln(1/2)) negative examples.

  • Classify x based on which gives better score.


A first attempt how to use it l.jpg

A First Attempt: How to use it?

  • K:(x,y) ! [-1,1] is an(,)-good similarityfor P if at leasta 1-probability mass ofx satisfy:

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+

Algorithm

  • Draw S+ of O((1/2) ln(1/2)) positive examples.

  • Draw S- of O((1/2) ln(1/2)) negative examples.

  • Classify x based on which gives better score.

Guarantee: with probability ¸1-, error · + .

Proof

  • Hoeffding: for any given “goodx”, probability of error w.r.t. x (over draw of S+, S-) at most 2.

  • By Markov, at most  chance that the error rate over GOOD is more than . So overall error rate · + .


A first attempt not broad enough l.jpg

more similar to negs than to typical pos

+

+

+

+

+

+

-

-

-

-

-

-

A First Attempt: Not Broad Enough

  • K:(x,y) ! [-1,1] is an(,)-good similarityfor P if at leasta 1-probability mass of x satisfy:

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+

  • K(x,y)=x ¢ y has large margin separator but doesn’t satisfy our definition.


A first attempt not broad enough39 l.jpg

A First Attempt: Not Broad Enough

  • K:(x,y) ! [-1,1] is an(,)-good similarityfor P if at leasta 1-probability mass of x satisfy:

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+

R

+

+

+

+

+

+

-

-

-

-

-

-

Idea: would work if we didn’t pick y’s from top-left.

Broaden to say:OK if 9 non-negligable region R s.t. most x are on average more similar to y2R of same label than to y2 R of other label.


Broader main definition l.jpg

Broader/Main Definition

  • K:(x,y) ! [-1,1] is an(,)-good similarityfor P if exists a weighting functionw(y) 2 [0,1]at leasta 1-probability mass of x satisfy:

Ey~P[w(y)K(x,y)|l(y)=l(x)] ¸ Ey~P[w(y)K(x,y)|l(y)l(x)]+


Main definition how to use it l.jpg

Main Definition, How to Use It

  • K:(x,y) ! [-1,1] is an(,)-good similarityfor P if exists a weighting functionw(y) 2 [0,1] at leasta 1-probability mass of x satisfy:

Ey~P[w(y)K(x,y)|l(y)=l(x)] ¸ Ey~P[w(y)K(x,y)|l(y)l(x)]+

Algorithm

  • Draw S+={y1, , yd}, S-={z1, , zd}, d=O((1/2) ln(1/2)).

  • Use to “triangulate” data:

F(x) = [K(x,y1), …,K(x,yd), K(x,zd),…,K(x,zd)].

  • Take a new set of labeled examples, project to this space, and run your favorite alg for learning lin. separators.

Point is: with probability ¸ 1-, exists linear separator of error · + at margin /4.

(w = [w(y1), …,w(yd),-w(zd),…,-w(zd)])


Main definition implications l.jpg

Main Definition, Implications

Algorithm

  • Draw S+={y1, , yd}, S-={z1, , zd}, d=O((1/2) ln(1/2)).

  • Use to “triangulate” data:

F(x) = [K(x,y1), …,K(x,yd), K(x,zd),…,K(x,zd)].

Guarantee: with prob. ¸ 1-, exists linear separator of error · + at margin /4.

legal kernel

Implications

K arbitrary sim. function

(,)-goodsim. function

(+,/4)-goodkernelfunction


Good kernels are good similarity functions l.jpg

Good Kernels are Good Similarity Functions

Main Definition: K:(x,y) ! [-1,1] is an(,)-good similarityfor P if exists a weighting functionw(y) 2 [0,1] at leasta 1-probability mass of x satisfy:

Ey~P[w(y)K(x,y)|l(y)=l(x)] ¸ Ey~P[w(y)K(x,y)|l(y)l(x)]+

Theorem

  • An (,)-good kernel is an (’,’)-good similarity function under main definition.

Our proofs incurred some penalty:

’ =  + extra, ’ = 3extra.

Nati Srebro (COLT 2007) has improved the bounds.


Learning with multiple similarity functions l.jpg

Sample complexity is roughly

Learning with Multiple Similarity Functions

  • Let K1, …, Kr be similarity functions s. t. some (unknown) convex combination of them is (,)-good.

Algorithm

  • Draw S+={y1, , yd}, S-={z1, , zd}, d=O((1/2) ln(1/2)).

  • Use to “triangulate” data:

F(x) = [K1(x,y1), …,Kr(x,yd), K1(x,zd),…,Kr(x,zd)].

Guarantee: The induced distribution F(P) in R2dr has a separator of error · +  at margin at least


Implications l.jpg

Implications

  • Theory that provides a formal way of understanding kernels as similarity functions.

  • Algorithms work for sim. fns that aren’t necessarily PSD.

  • Suggests natural approach for using similarity functions to augment feature vector in “anytime” way.

    • E.g., features for document can be list of words in it, plus similarity to a few “landmark” documents.

  • Formal justification for “Feature Generation for Text Categorization using World Knowledge”, GM’05

Mugizi has proposed on this


Slide46 l.jpg

Clustering via Similarity Functions

(Work in Progress, with Avrim and Santosh)


What if only unlabeled examples available l.jpg

What if only unlabeled examples available?

Consider the following setting:

  • Given data set S of n objects.

  • There is some (unknown) “ground truth” clustering. Each x has true label l(x) in {1,…,t}.

  • Goal: produce hypothesis h of low error up to isomorphism of label names.

[documents,

web pages]

[topic]

People have traditionally considered mixture models here.

Can we say something in our setting?


What if only unlabeled examples available48 l.jpg

What if only unlabeled examples available?

  • Suppose our similarity function satisfies the stronger condition:

  • Ground truth is “stable” in that

  • Then, can construct a tree (hierarchical clustering) such that the correct clustering is some pruning of this tree.

For all clusters C, C’, for all A in C, A’ in C’:

A and A’ are not both more attracted to each other than to their own clusters.

K(x,y) is attraction between x and y


What if only unlabeled examples available49 l.jpg

What if only unlabeled examples available?

  • Suppose our similarity function satisfies the stronger condition:

  • Ground truth is “stable” in that

For all clusters C, C’, for all A in C, A’ in C’:

A and A’ are not both more attracted to each other than to their own clusters.

K(x,y) is attraction between x and y

fashion

sports

volleyball

Dolce & Gabbana

soccer

Cocco Chanel

gymnastics


Main point l.jpg

Main point

  • Exploring the question: what are minimal conditions on a similarity function that allow it to be useful for clustering?

  • Have considered two relaxations of the Clustering objective:

  • List Clustering -- small number of candidate clusterings.

  • Hierarchical clustering -- output a tree such that right answer is some pruning of it.

  • Allow for right answer to be identified with a little bit of additional feedback.


Slide51 l.jpg

Modern Learning Paradigms: Future Work

Modern Learning Paradigms

Incorporating Unlabeled Data in the Learning Process

Kernel, Similarity based learning and Clustering

Active Learning

Learning with Sim. Functions

- Margin based AL of linear separators

Alternative/tighter definitions

and connections.

Extend the analysis to a more general

class of distributions, e.g. log-concave.

Clustering via Sim. Functions

Can we get an efficient alg. for the stability of large subsets property

Interactive Feedback


Slide52 l.jpg

MLA and Algorithmic Game Theory, Future Work

Mechanism Design, ML, and Pricing Problems

Revenue maximizationin comb. auctions with general preferences.

Extend BBHM’05 to the limited supply setting.

Approximation algorithms for the case of pricing below cost.


Timeline l.jpg

Timeline

  • Plan to finish in a year

Summer 07

- Revenue Maximization in General Comb. Auctions, limited and unlimited supply.

Fall 07

  • Clustering via Similarity Functions

  • Active Learning under Log-Concave Distributions

Spring 08

Wrap-up; writing; job search!


Slide54 l.jpg

Thank you !


  • Login