- 83 Views
- Uploaded on
- Presentation posted in: General

Advanced Machine Learning & Perception

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Tony Jebara, Columbia University

Instructor: Tony Jebara

Tony Jebara, Columbia University

- Combining Multiple Classifiers
- Voting
- Boosting
- Adaboost
- Based on material by Y. Freund, P. Long & R. Schapire

Tony Jebara, Columbia University

- Have many simple learners
- Also called base learners or
- weak learners which have a classification error of <0.5
- Combine or vote them to get a higher accuracy
- No free lunch: there is no guaranteed best approach here
- Different approaches:
- Voting
- combine learners with fixed weight

- Mixture of Experts
- adjust learners and a variable weight/gate fn
- Boosting
- actively search for next base-learners and vote
- Cascading, Stacking, Bagging, etc.

Tony Jebara, Columbia University

- Have T classifiers
- Average their prediction with weights
- Like mixture of experts but weight is constant with input

y

a

a

a

Expert

Expert

Expert

x

Tony Jebara, Columbia University

- Have T classifiers or experts and a gating fn
- Average their prediction with variable weights
- But, adapt parameters of the gating function
- and the experts (fixed total number T of experts)

Output

Gating

Network

Expert

Expert

Expert

Input

Tony Jebara, Columbia University

- Actively find complementary or synergistic weak learners
- Train next learner based on mistakes of previous ones.
- Average prediction with fixed weights
- Find next learner by training on weighted versions of data.

training data

weak rule

weak learner

ensemble of learners

weights

prediction

Tony Jebara, Columbia University

- Most popular weighting scheme
- Define margin for point i as
- Find an ht and find weight at to min the cost function
- sum exp-margins

training data

weak rule

weak learner

ensemble of learners

weights

prediction

Tony Jebara, Columbia University

- Choose base learner & at to
- Recall error of
- base classifier ht must be
- For binary h, Adaboost puts this weight on weak learners:
- (instead of the
- more general rule)
- Adaboost picks the following for the weights on data for
- the next round (here Z is the normalizer to sum to 1)

Tony Jebara, Columbia University

Y

no

no

yes

yes

5

X

3

-1

+1

X>3

-1

Y>5

-1

-1

+1

Tony Jebara, Columbia University

Y

-1

+1

+0.2

-0.1

+0.1

X>3

no

yes

-1

-0.3

+0.1

sign

-0.1

Y>5

no

yes

-0.3

+0.2

X

-0.2

-0.2

Tony Jebara, Columbia University

Y

+0.2

Y<1

-0.1

+0.1

-1

+1

no

yes

X>3

+0.7

0.0

no

yes

-0.3

+0.1

-0.1

-1

sign

Y>5

no

yes

-0.3

+0.2

+1

X

-0.2

+0.7

Tony Jebara, Columbia University

- Cleve dataset from UC Irvine database.
- Heart disease diagnostics (+1=healthy,-1=sick)
- 13 features from tests (real valued and discrete).
- 303 instances.

Tony Jebara, Columbia University

Ad-Tree Example

Tony Jebara, Columbia University

Tony Jebara, Columbia University

Logitboost

Loss

Brownboost

0-1 loss

Margin

Mistakes

Correct

- Rationale?
- Consider bound on
- the training error:
- Adaboost is essentially doing gradient descent on this.
- Convergence?

exp bound on step

definition of f(x)

recursive use of Z

Tony Jebara, Columbia University

- Convergence? Consider the binary ht case.

So, the final learner converges exponentially fast in T if each weak learner is at least better than gamma!

Tony Jebara, Columbia University

Boosting decision trees

Using <10,000 training examples we fit >2,000,000 parameters

Tony Jebara, Columbia University

0-1 loss

Margin

Tony Jebara, Columbia University

No examples with small margins!!

0-1 loss

Margin

Tony Jebara, Columbia University

Tony Jebara, Columbia University

- Also, a VC analysis gives a generalization bound:
- (where d is VC of
- base classifier)
- But, more iterations overfitting!
- A margin analysis is possible, redefine margin as:
- Then have

Tony Jebara, Columbia University

- Suggests this optimization problem:

Margin

Tony Jebara, Columbia University

- Proof Sketch

Tony Jebara, Columbia University

% test error rates