2 bayes decision theory
1 / 21

- PowerPoint PPT Presentation

  • Updated On :

2. Bayes Decision Theory. Prof. A.L. Yuille Stat 231. Fall 2004. Decisions with Uncertainty. Bayes Decision Theory is a theory for how to make decisions in the presence of uncertainty. Input data x. Salmon y= +1, Sea Bass y=-1. Learn decision rule: f(x) taking values .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - vui

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
2 bayes decision theory l.jpg

2. Bayes Decision Theory

Prof. A.L. Yuille

Stat 231. Fall 2004.

Decisions with uncertainty l.jpg
Decisions with Uncertainty

  • Bayes Decision Theory is a theory for how to make decisions in the presence of uncertainty.

  • Input data x.

  • Salmon y= +1, Sea Bass y=-1.

  • Learn decision rule: f(x) taking values

Decision rule for fish l.jpg
Decision Rule for Fish.

  • Classify fish as Salmon or Sea Bass by decision rule f(x).

Basic ingredients l.jpg
Basic Ingredients.

  • Assume there are probability distributions for generating the data.

  • P(x|y=1) and P(x|y=-1).

  • Loss function L(f(x),y) specifies the loss of making decision f(x) when true state is y.

  • Distribution P(y). Prior probability on y.

  • Joint Distribution P(x,y) = P(x|y) P(y).

Minimize the risk l.jpg
Minimize the Risk

  • The risk of a decision rule f(x) is:

  • Bayes Decision Rule f*(x):

  • The Bayes Risk:

Minimize the risk6 l.jpg
Minimize the Risk.

  • Write P(x,y) = P(y|x) P(x).

  • Then we can write the Risk as:

  • The best decision for input x is f*(x):

Bayes rule l.jpg
Bayes Rule.

  • Posterior distribution P(y|x):

  • Likelihood function P(x|y)

  • Prior P(y).

  • Bayes Rule has been controversial (historically) because of the Prior P(y) (subjective?).

  • But in Bayes Decision Theory, everything starts from the joint distribution P(x,y).

Slide8 l.jpg

  • The Risk is based on averaging over all possible x & y. Average Loss.

  • Alternatively, can try to minimize the worst risk over x & y. Minimax Criterion.

  • This course uses the Risk, or average loss.

Generative discriminative l.jpg
Generative & Discriminative.

  • Generative methods aim to determine probability models P(x|y) & P(y).

  • Discriminative methods aim directly at estimating the decision rule f(x).

  • Vapnik argues for Discriminative Methods: Don’t solve a harder problem than you need to. Only care about the probabilities near the decision boundaries.

Discriminant functions l.jpg
Discriminant Functions.

  • For two category case the Bayes decision rule depends on the discriminant function:

  • The Bayes decision rule is of form:

  • Where T is a threshold, which is determined by the loss function.

Two state case l.jpg
Two-State Case

  • Detect “target” or “non-target”.

  • Let loss function pay a penalty of 1 for misclassification, 0 otherwise.

  • Risk becomes Error. Bayes Risk becomes Bayes Error.

  • Error is the sum of false positives F+ (non- targets classified as targets) and false negatives F- (targets classified as non-targets).

Gaussian example 1 l.jpg
Gaussian Example: 1

  • Is a bright light flashing?

  • n is no. photons emitted by dim or bright light.

8 gaussian example 2 l.jpg
8. Gaussian Example: 2

  • are Gaussians with

    means and s.d. .

  • Bayes decision rule selects “dim” if ;

  • Errors:

Example multidimensional gaussian distributions l.jpg
Example: Multidimensional Gaussian Distributions.

  • Suppose the two classes have Gaussian distributions for P(x|y).

  • Different means

    but same covariance

  • The discriminant function is a plane:

  • Alternatively, seek a planar decision rule without attempting to model the distributions.

  • Only care about the data near the decision boundary.

Generative vrs discriminant l.jpg
Generative vrs. Discriminant.

  • The Generative approach will attempt to estimate the Gaussian distributions from data – and then derive the decision rule.

  • The Discriminant approach will seek to estimate the decision rule directly by learning the discriminant plane.

  • In practice, we will not know the form of the distributions of the form of the discriminant.

Gaussian l.jpg

  • Gaussian Case with unequal covariance.

Discriminative models features l.jpg
Discriminative Models & Features.

  • In practice, the Discriminative methods are usually defined based on features extracted from the data. (E.g. length and brightness of fish).

  • Calculate features z=h(x).

  • Bayes Decision Theory says that this throws away information.

  • Restrict to a sub-class of possible decision rules – those that can be expressed in terms of features z=h(x).

Bayes decision rule and learning l.jpg
Bayes Decision Rule and Learning.

  • Bayes Decision Theory assumes that we know, or can learn, the distributions P(x|y).

  • This is often not practical, or extremely difficult.

  • In real problems, you have a set of classified data

  • You can attempt to learn P(x|y=+1) & P(x|y=-1) from these (next few lectures).

  • Parametric & Non-parametric approaches.

  • Question: when do you have enough data to learn these probabilities accurately?

  • Depends on the complexity of the model.

Machine learning l.jpg
Machine Learning.

  • Replace Risk by Empirical Risk

  • How does minimizing the empirical risk relate to minimizing the true risk?

  • Key Issue: When can we generalize? Be confident that the decision rule we have learnt on the training data will yield good results on unseen data?

Machine learning20 l.jpg
Machine Learning

  • Vapnik’s theory gives a mathematically elegant way of answering these issues.

  • It assumes that the data is sampled from an unknown distribution.

  • Vapnik’s theory gives bounds for when we can generalize.

  • Unfortunately these bounds are very conservative.

  • In practice, train on part of dataset and test on other part(s).

Extensions to multiple classes l.jpg
Extensions to Multiple Classes

Conceptually straightforward – see Duda, Hart & Stork.

The decision partitionsf the feature space into k subspaces