Advanced machine learning perception
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Advanced Machine Learning & Perception PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Advanced Machine Learning & Perception. Instructor: Tony Jebara. Boosting. Combining Multiple Classifiers Voting Boosting Adaboost Based on material by Y. Freund, P. Long & R. Schapire. Combining Multiple Learners. Have many simple learners Also called base learners or

Download Presentation

Advanced Machine Learning & Perception

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Advanced machine learning perception

Tony Jebara, Columbia University

Advanced Machine Learning & Perception

Instructor: Tony Jebara


Boosting

Tony Jebara, Columbia University

Boosting

  • Combining Multiple Classifiers

  • Voting

  • Boosting

  • Adaboost

  • Based on material by Y. Freund, P. Long & R. Schapire


Combining multiple learners

Tony Jebara, Columbia University

Combining Multiple Learners

  • Have many simple learners

  • Also called base learners or

  • weak learners which have a classification error of <0.5

  • Combine or vote them to get a higher accuracy

  • No free lunch: there is no guaranteed best approach here

  • Different approaches:

    • Voting

    • combine learners with fixed weight

  • Mixture of Experts

  • adjust learners and a variable weight/gate fn

  • Boosting

  • actively search for next base-learners and vote

  • Cascading, Stacking, Bagging, etc.


Voting

Tony Jebara, Columbia University

Voting

  • Have T classifiers

  • Average their prediction with weights

  • Like mixture of experts but weight is constant with input

y

a

a

a

Expert

Expert

Expert

x


Mixture of experts

Tony Jebara, Columbia University

Mixture of Experts

  • Have T classifiers or experts and a gating fn

  • Average their prediction with variable weights

  • But, adapt parameters of the gating function

  • and the experts (fixed total number T of experts)

Output

Gating

Network

Expert

Expert

Expert

Input


Boosting1

Tony Jebara, Columbia University

Boosting

  • Actively find complementary or synergistic weak learners

  • Train next learner based on mistakes of previous ones.

  • Average prediction with fixed weights

  • Find next learner by training on weighted versions of data.

training data

weak rule

weak learner

ensemble of learners

weights

prediction


Adaboost

Tony Jebara, Columbia University

AdaBoost

  • Most popular weighting scheme

  • Define margin for point i as

  • Find an ht and find weight at to min the cost function

  • sum exp-margins

training data

weak rule

weak learner

ensemble of learners

weights

prediction


Adaboost1

Tony Jebara, Columbia University

AdaBoost

  • Choose base learner & at to

  • Recall error of

  • base classifier ht must be

  • For binary h, Adaboost puts this weight on weak learners:

  • (instead of the

  • more general rule)

  • Adaboost picks the following for the weights on data for

  • the next round (here Z is the normalizer to sum to 1)


Decision trees

Tony Jebara, Columbia University

Y

no

no

yes

yes

5

X

3

Decision Trees

-1

+1

X>3

-1

Y>5

-1

-1

+1


Decision tree as a sum

Tony Jebara, Columbia University

Y

-1

+1

+0.2

-0.1

+0.1

X>3

no

yes

-1

-0.3

+0.1

sign

-0.1

Y>5

no

yes

-0.3

+0.2

X

Decision tree as a sum

-0.2

-0.2


An alternating decision tree

Tony Jebara, Columbia University

Y

+0.2

Y<1

-0.1

+0.1

-1

+1

no

yes

X>3

+0.7

0.0

no

yes

-0.3

+0.1

-0.1

-1

sign

Y>5

no

yes

-0.3

+0.2

+1

X

An alternating decision tree

-0.2

+0.7


Example medical diagnostics

Tony Jebara, Columbia University

Example: Medical Diagnostics

  • Cleve dataset from UC Irvine database.

  • Heart disease diagnostics (+1=healthy,-1=sick)

  • 13 features from tests (real valued and discrete).

  • 303 instances.


Advanced machine learning perception

Tony Jebara, Columbia University

Ad-Tree Example


Cross validated accuracy

Tony Jebara, Columbia University

Cross-validated accuracy


Adaboost convergence

Tony Jebara, Columbia University

Logitboost

Loss

Brownboost

0-1 loss

Margin

Mistakes

Correct

AdaBoost Convergence

  • Rationale?

  • Consider bound on

  • the training error:

  • Adaboost is essentially doing gradient descent on this.

  • Convergence?

exp bound on step

definition of f(x)

recursive use of Z


Adaboost convergence1

Tony Jebara, Columbia University

AdaBoost Convergence

  • Convergence? Consider the binary ht case.

So, the final learner converges exponentially fast in T if each weak learner is at least better than gamma!


Curious phenomenon

Tony Jebara, Columbia University

Curious phenomenon

Boosting decision trees

Using <10,000 training examples we fit >2,000,000 parameters


Explanation using margins

Tony Jebara, Columbia University

Explanation using margins

0-1 loss

Margin


Explanation using margins1

Tony Jebara, Columbia University

No examples with small margins!!

Explanation using margins

0-1 loss

Margin


Experimental evidence

Tony Jebara, Columbia University

Experimental Evidence


Adaboost generalization

Tony Jebara, Columbia University

AdaBoost Generalization

  • Also, a VC analysis gives a generalization bound:

  • (where d is VC of

  • base classifier)

  • But, more iterations  overfitting!

  • A margin analysis is possible, redefine margin as:

  • Then have


Adaboost generalization1

Tony Jebara, Columbia University

AdaBoost Generalization

  • Suggests this optimization problem:

Margin


Adaboost generalization2

Tony Jebara, Columbia University

AdaBoost Generalization

  • Proof Sketch


Uci results

Tony Jebara, Columbia University

UCI Results

% test error rates


  • Login