Bayesian learning vc dimension
Download
1 / 17

Bayesian Learning VC Dimension - PowerPoint PPT Presentation


  • 108 Views
  • Updated On :

Bayesian Learning VC Dimension . Jahwan Kim 2000. 5. 24 AIPR Lab. CSD., KAIST. Contents. Bayesian learning General idea, & an example Parametric vs. nonparametric statistical inference Model capacity and generalizability Further readings. Bayesian learning.

Related searches for Bayesian Learning VC Dimension

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Bayesian Learning VC Dimension' - everett


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bayesian learning vc dimension l.jpg

Bayesian Learning VC Dimension

Jahwan Kim

2000. 5. 24

AIPR Lab. CSD., KAIST


Contents l.jpg
Contents

  • Bayesian learning

    • General idea, & an example

  • Parametric vs. nonparametric statistical inference

  • Model capacity and generalizability

  • Further readings

Jahwan Kim, Dept. of CS, KAIST


Bayesian learning l.jpg
Bayesian learning

  • Conclude from hypothesis constructed from the given data.

  • Predictions are made from the hypotheses, weighted by their posterior probabilities.

Jahwan Kim, Dept. of CS, KAIST


Bayesian learning formulation l.jpg
Bayesian learningFormulation

  • X is the prediction, H’s the hypotheses, and D the give data.

  • Requires calculation of P(H|D) for all H’s, and this is intractable in many cases.

Jahwan Kim, Dept. of CS, KAIST


Bayesian learning maximum a posteriori hypothesis l.jpg
Bayesian learningMaximum a posteriori hypothesis

  • Take H that maximizes the a posteriori probability P(H|D).

  • How do we find such H? Use Bayes’ rule:

Jahwan Kim, Dept. of CS, KAIST


Bayes learning continued l.jpg
Bayes learningcontinued

  • P(D) remains fixed for all H.

  • P(D|H) is the likelihood the given data is observed given H.

  • P(H), the prior probability, has been the source of debate.

    • If too biased, we get underfitting.

    • Sometimes a uniform prior is appropriate. In that case, we choose the maximum likelihood hypothesis.

Jahwan Kim, Dept. of CS, KAIST


Bayesian learning parameter estimation l.jpg
Bayesian learningParameter estimation

  • Problem: Find p(x|D) when

    • We know the form of pdf, i.e., the pdf is parametrized by , written as p(x|).

    • A priori pdf p() is known.

    • Data D is given.

  • We only have to find p(|D), since then we may use

Jahwan Kim, Dept. of CS, KAIST


Parameter estimation continued l.jpg
Parameter estimation, continued

  • By Bayes’ rule,

  • Assume also each sample in D is drawn independently with identical pdf, i.e., it is i.i.d. Then

    • This gives the formal solution to the problem

Jahwan Kim, Dept. of CS, KAIST


Parameter estimation example l.jpg
Parameter estimationExample

  • One-dimensional normal distribution

    • Two parameters,  and .

    • Assume that p() is normal with known mean m and variance s.

    • Assume also that  is also known.

    • Then

Jahwan Kim, Dept. of CS, KAIST


Example continued l.jpg
Example, continued

  •  squared term appears in the exponent of the expression

    (or compute it)

  • Namely, p(|D) is also normal.

    • Its variance and mean are given by

      where is the sample mean.

Jahwan Kim, Dept. of CS, KAIST


Estimation of mean l.jpg
Estimation of mean

  • As n goes to infinity , p(|D) approaches the Dirac delta function centered at the sample mean.

Jahwan Kim, Dept. of CS, KAIST


Two main approaches of statistical inference l.jpg
Two main approaches of (statistical) inference

  • Parametric inference

    • Investigator should know the problem well.

    • The model contains finite number of unknown parameters.

  • Nonparametric inference

    • No reliable a priori info about the problem.

    • Number of samples required is too large.

Jahwan Kim, Dept. of CS, KAIST


Capacity of models l.jpg
Capacity of models

  • Well known fact:

    • If a model is too complicated, it doesn’t generalize well;

    • if too simple, it doesn’t represent well.

  • How do we measure model capacity?

    • In classical statistics, by the number of parameter, or degree of freedom

    • In the (new) statistical learning theory, by VC dim.

Jahwan Kim, Dept. of CS, KAIST


Vc dimension l.jpg
VC dimension

  • Vapnik-Chervonenkis dimension is a measure of capacity of a model.

Jahwan Kim, Dept. of CS, KAIST


Vc dimension examples l.jpg
VC dimensionExamples

  • It’s not always equal to the number of parameters:

    • A line of the form {ax+by+c} in 2D plane has VC dimension 3, but

  • One parameter family {sgn(sin ax)} (in one dimension) has VC dimension infinity!

Jahwan Kim, Dept. of CS, KAIST


Theorem from stl on vc dimension and generalizability l.jpg
Theorem from STL onVC dimension and generalizability

Jahwan Kim, Dept. of CS, KAIST


Further readings l.jpg
Further readings

  • Vapnik, Statistical Learning Theory, Ch. 0 & sections 1.1-1.3

  • Haykin, Neural Networks, sections 2.13-2.14

  • Duda & Hart, Pattern Classification and Scene Analysis, sections 3.3-3.5

Jahwan Kim, Dept. of CS, KAIST


ad