advanced machine learning perception
Download
Skip this Video
Download Presentation
Advanced Machine Learning & Perception

Loading in 2 Seconds...

play fullscreen
1 / 24

Advanced Machine Learning & Perception - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

Advanced Machine Learning & Perception. Instructor: Tony Jebara. Boosting. Combining Multiple Classifiers Voting Boosting Adaboost Based on material by Y. Freund, P. Long & R. Schapire. Combining Multiple Learners. Have many simple learners Also called base learners or

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Advanced Machine Learning & Perception' - ham


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
boosting
Tony Jebara, Columbia UniversityBoosting
  • Combining Multiple Classifiers
  • Voting
  • Boosting
  • Adaboost
  • Based on material by Y. Freund, P. Long & R. Schapire
combining multiple learners
Tony Jebara, Columbia UniversityCombining Multiple Learners
  • Have many simple learners
  • Also called base learners or
  • weak learners which have a classification error of <0.5
  • Combine or vote them to get a higher accuracy
  • No free lunch: there is no guaranteed best approach here
  • Different approaches:
    • Voting
    • combine learners with fixed weight
  • Mixture of Experts
  • adjust learners and a variable weight/gate fn
  • Boosting
  • actively search for next base-learners and vote
  • Cascading, Stacking, Bagging, etc.
voting
Tony Jebara, Columbia UniversityVoting
  • Have T classifiers
  • Average their prediction with weights
  • Like mixture of experts but weight is constant with input

y

a

a

a

Expert

Expert

Expert

x

mixture of experts
Tony Jebara, Columbia UniversityMixture of Experts
  • Have T classifiers or experts and a gating fn
  • Average their prediction with variable weights
  • But, adapt parameters of the gating function
  • and the experts (fixed total number T of experts)

Output

Gating

Network

Expert

Expert

Expert

Input

boosting1
Tony Jebara, Columbia UniversityBoosting
  • Actively find complementary or synergistic weak learners
  • Train next learner based on mistakes of previous ones.
  • Average prediction with fixed weights
  • Find next learner by training on weighted versions of data.

training data

weak rule

weak learner

ensemble of learners

weights

prediction

adaboost
Tony Jebara, Columbia UniversityAdaBoost
  • Most popular weighting scheme
  • Define margin for point i as
  • Find an ht and find weight at to min the cost function
  • sum exp-margins

training data

weak rule

weak learner

ensemble of learners

weights

prediction

adaboost1
Tony Jebara, Columbia UniversityAdaBoost
  • Choose base learner & at to
  • Recall error of
  • base classifier ht must be
  • For binary h, Adaboost puts this weight on weak learners:
  • (instead of the
  • more general rule)
  • Adaboost picks the following for the weights on data for
  • the next round (here Z is the normalizer to sum to 1)
decision trees
Tony Jebara, Columbia University

Y

no

no

yes

yes

5

X

3

Decision Trees

-1

+1

X>3

-1

Y>5

-1

-1

+1

decision tree as a sum
Tony Jebara, Columbia University

Y

-1

+1

+0.2

-0.1

+0.1

X>3

no

yes

-1

-0.3

+0.1

sign

-0.1

Y>5

no

yes

-0.3

+0.2

X

Decision tree as a sum

-0.2

-0.2

an alternating decision tree
Tony Jebara, Columbia University

Y

+0.2

Y<1

-0.1

+0.1

-1

+1

no

yes

X>3

+0.7

0.0

no

yes

-0.3

+0.1

-0.1

-1

sign

Y>5

no

yes

-0.3

+0.2

+1

X

An alternating decision tree

-0.2

+0.7

example medical diagnostics
Tony Jebara, Columbia UniversityExample: Medical Diagnostics
  • Cleve dataset from UC Irvine database.
  • Heart disease diagnostics (+1=healthy,-1=sick)
  • 13 features from tests (real valued and discrete).
  • 303 instances.
adaboost convergence
Tony Jebara, Columbia University

Logitboost

Loss

Brownboost

0-1 loss

Margin

Mistakes

Correct

AdaBoost Convergence
  • Rationale?
  • Consider bound on
  • the training error:
  • Adaboost is essentially doing gradient descent on this.
  • Convergence?

exp bound on step

definition of f(x)

recursive use of Z

adaboost convergence1
Tony Jebara, Columbia UniversityAdaBoost Convergence
  • Convergence? Consider the binary ht case.

So, the final learner converges exponentially fast in T if each weak learner is at least better than gamma!

curious phenomenon
Tony Jebara, Columbia UniversityCurious phenomenon

Boosting decision trees

Using <10,000 training examples we fit >2,000,000 parameters

explanation using margins1
Tony Jebara, Columbia University

No examples with small margins!!

Explanation using margins

0-1 loss

Margin

adaboost generalization
Tony Jebara, Columbia UniversityAdaBoost Generalization
  • Also, a VC analysis gives a generalization bound:
  • (where d is VC of
  • base classifier)
  • But, more iterations  overfitting!
  • A margin analysis is possible, redefine margin as:
  • Then have
adaboost generalization1
Tony Jebara, Columbia UniversityAdaBoost Generalization
  • Suggests this optimization problem:

Margin

ad