1 / 17

# Bayesian Learning VC Dimension - PowerPoint PPT Presentation

Bayesian Learning VC Dimension . Jahwan Kim 2000. 5. 24 AIPR Lab. CSD., KAIST. Contents. Bayesian learning General idea, & an example Parametric vs. nonparametric statistical inference Model capacity and generalizability Further readings. Bayesian learning.

Related searches for Bayesian Learning VC Dimension

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Bayesian Learning VC Dimension' - everett

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Bayesian Learning VC Dimension

Jahwan Kim

2000. 5. 24

AIPR Lab. CSD., KAIST

• Bayesian learning

• General idea, & an example

• Parametric vs. nonparametric statistical inference

• Model capacity and generalizability

Jahwan Kim, Dept. of CS, KAIST

• Conclude from hypothesis constructed from the given data.

• Predictions are made from the hypotheses, weighted by their posterior probabilities.

Jahwan Kim, Dept. of CS, KAIST

Bayesian learningFormulation

• X is the prediction, H’s the hypotheses, and D the give data.

• Requires calculation of P(H|D) for all H’s, and this is intractable in many cases.

Jahwan Kim, Dept. of CS, KAIST

Bayesian learningMaximum a posteriori hypothesis

• Take H that maximizes the a posteriori probability P(H|D).

• How do we find such H? Use Bayes’ rule:

Jahwan Kim, Dept. of CS, KAIST

Bayes learningcontinued

• P(D) remains fixed for all H.

• P(D|H) is the likelihood the given data is observed given H.

• P(H), the prior probability, has been the source of debate.

• If too biased, we get underfitting.

• Sometimes a uniform prior is appropriate. In that case, we choose the maximum likelihood hypothesis.

Jahwan Kim, Dept. of CS, KAIST

Bayesian learningParameter estimation

• Problem: Find p(x|D) when

• We know the form of pdf, i.e., the pdf is parametrized by , written as p(x|).

• A priori pdf p() is known.

• Data D is given.

• We only have to find p(|D), since then we may use

Jahwan Kim, Dept. of CS, KAIST

• By Bayes’ rule,

• Assume also each sample in D is drawn independently with identical pdf, i.e., it is i.i.d. Then

• This gives the formal solution to the problem

Jahwan Kim, Dept. of CS, KAIST

Parameter estimationExample

• One-dimensional normal distribution

• Two parameters,  and .

• Assume that p() is normal with known mean m and variance s.

• Assume also that  is also known.

• Then

Jahwan Kim, Dept. of CS, KAIST

•  squared term appears in the exponent of the expression

(or compute it)

• Namely, p(|D) is also normal.

• Its variance and mean are given by

where is the sample mean.

Jahwan Kim, Dept. of CS, KAIST

• As n goes to infinity , p(|D) approaches the Dirac delta function centered at the sample mean.

Jahwan Kim, Dept. of CS, KAIST

• Parametric inference

• Investigator should know the problem well.

• The model contains finite number of unknown parameters.

• Nonparametric inference

• No reliable a priori info about the problem.

• Number of samples required is too large.

Jahwan Kim, Dept. of CS, KAIST

• Well known fact:

• If a model is too complicated, it doesn’t generalize well;

• if too simple, it doesn’t represent well.

• How do we measure model capacity?

• In classical statistics, by the number of parameter, or degree of freedom

• In the (new) statistical learning theory, by VC dim.

Jahwan Kim, Dept. of CS, KAIST

• Vapnik-Chervonenkis dimension is a measure of capacity of a model.

Jahwan Kim, Dept. of CS, KAIST

VC dimensionExamples

• It’s not always equal to the number of parameters:

• A line of the form {ax+by+c} in 2D plane has VC dimension 3, but

• One parameter family {sgn(sin ax)} (in one dimension) has VC dimension infinity!

Jahwan Kim, Dept. of CS, KAIST

Theorem from STL onVC dimension and generalizability

Jahwan Kim, Dept. of CS, KAIST

• Vapnik, Statistical Learning Theory, Ch. 0 & sections 1.1-1.3

• Haykin, Neural Networks, sections 2.13-2.14

• Duda & Hart, Pattern Classification and Scene Analysis, sections 3.3-3.5

Jahwan Kim, Dept. of CS, KAIST