Loading in 5 sec....

2. Bayes Decision TheoryPowerPoint Presentation

2. Bayes Decision Theory

- By
**vui** - Follow User

- 419 Views
- Updated On :

Download Presentation
## PowerPoint Slideshow about '' - vui

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Decisions with Uncertainty

- Bayes Decision Theory is a theory for how to make decisions in the presence of uncertainty.
- Input data x.
- Salmon y= +1, Sea Bass y=-1.
- Learn decision rule: f(x) taking values

Decision Rule for Fish.

- Classify fish as Salmon or Sea Bass by decision rule f(x).

Basic Ingredients.

- Assume there are probability distributions for generating the data.
- P(x|y=1) and P(x|y=-1).
- Loss function L(f(x),y) specifies the loss of making decision f(x) when true state is y.
- Distribution P(y). Prior probability on y.
- Joint Distribution P(x,y) = P(x|y) P(y).

Minimize the Risk

- The risk of a decision rule f(x) is:
- Bayes Decision Rule f*(x):
- The Bayes Risk:

Minimize the Risk.

- Write P(x,y) = P(y|x) P(x).
- Then we can write the Risk as:
- The best decision for input x is f*(x):

Bayes Rule.

- Posterior distribution P(y|x):
- Likelihood function P(x|y)
- Prior P(y).
- Bayes Rule has been controversial (historically) because of the Prior P(y) (subjective?).
- But in Bayes Decision Theory, everything starts from the joint distribution P(x,y).

Risk.

- The Risk is based on averaging over all possible x & y. Average Loss.
- Alternatively, can try to minimize the worst risk over x & y. Minimax Criterion.
- This course uses the Risk, or average loss.

Generative & Discriminative.

- Generative methods aim to determine probability models P(x|y) & P(y).
- Discriminative methods aim directly at estimating the decision rule f(x).
- Vapnik argues for Discriminative Methods: Don’t solve a harder problem than you need to. Only care about the probabilities near the decision boundaries.

Discriminant Functions.

- For two category case the Bayes decision rule depends on the discriminant function:
- The Bayes decision rule is of form:
- Where T is a threshold, which is determined by the loss function.

Two-State Case

- Detect “target” or “non-target”.
- Let loss function pay a penalty of 1 for misclassification, 0 otherwise.
- Risk becomes Error. Bayes Risk becomes Bayes Error.
- Error is the sum of false positives F+ (non- targets classified as targets) and false negatives F- (targets classified as non-targets).

Gaussian Example: 1

- Is a bright light flashing?
- n is no. photons emitted by dim or bright light.

8. Gaussian Example: 2

- are Gaussians with
means and s.d. .

- Bayes decision rule selects “dim” if ;
- Errors:

Example: Multidimensional Gaussian Distributions.

- Suppose the two classes have Gaussian distributions for P(x|y).
- Different means
but same covariance

- The discriminant function is a plane:
- Alternatively, seek a planar decision rule without attempting to model the distributions.
- Only care about the data near the decision boundary.

Generative vrs. Discriminant.

- The Generative approach will attempt to estimate the Gaussian distributions from data – and then derive the decision rule.
- The Discriminant approach will seek to estimate the decision rule directly by learning the discriminant plane.
- In practice, we will not know the form of the distributions of the form of the discriminant.

Gaussian.

- Gaussian Case with unequal covariance.

Discriminative Models & Features.

- In practice, the Discriminative methods are usually defined based on features extracted from the data. (E.g. length and brightness of fish).
- Calculate features z=h(x).
- Bayes Decision Theory says that this throws away information.
- Restrict to a sub-class of possible decision rules – those that can be expressed in terms of features z=h(x).

Bayes Decision Rule and Learning.

- Bayes Decision Theory assumes that we know, or can learn, the distributions P(x|y).
- This is often not practical, or extremely difficult.
- In real problems, you have a set of classified data
- You can attempt to learn P(x|y=+1) & P(x|y=-1) from these (next few lectures).
- Parametric & Non-parametric approaches.
- Question: when do you have enough data to learn these probabilities accurately?
- Depends on the complexity of the model.

Machine Learning.

- Replace Risk by Empirical Risk
- How does minimizing the empirical risk relate to minimizing the true risk?
- Key Issue: When can we generalize? Be confident that the decision rule we have learnt on the training data will yield good results on unseen data?

Machine Learning

- Vapnik’s theory gives a mathematically elegant way of answering these issues.
- It assumes that the data is sampled from an unknown distribution.
- Vapnik’s theory gives bounds for when we can generalize.
- Unfortunately these bounds are very conservative.
- In practice, train on part of dataset and test on other part(s).

Extensions to Multiple Classes

Conceptually straightforward – see Duda, Hart & Stork.

The decision partitionsf the feature space into k subspaces

5

3

2

1

4

Download Presentation

Connecting to Server..