- 51 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Evaluating Hypothesis' - sylvester-potts

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Evaluating Hypothesis

Evaluating Hypothesis

Evaluating Hypothesis

- Introduction
- Estimating Accuracy
- Sampling Theory and Confidence Intervals
- Differences in Error
- Comparing Learning Algorithms

Introduction

Given some training data a learning algorithm produces a hypothesis.

The next step is to estimate the accuracy of the hypothesis on

testing data:

Data Learning Hypothesis

Algorithm

Performance Assessment

- How do we know how precise is our estimation?
- There are two difficulties:
- Bias in the estimate
- Variance in the estimate

Bias and Variance

- Bias in the estimate. Normally overoptimisitc.
- To avoid it we use a separate set of data.
- Variance in the estimate. The estimation varies from
- sample to sample. The smaller the sample the larger the variance.

Estimated Accuracy

Variance

accuracy

True accuracy

Bias

sample size

Evaluating Hypothesis

- Introduction
- Estimating Accuracy
- Sampling Theory and Confidence Intervals
- Differences in Error
- Comparing Learning Algorithms

Estimating Accuracy

Examples in the input space are randomly distributed according

to some probability distribution D.

p(X)

Input Space X

- Questions:
- Given a hypothesis h and a dataset with n examples randomly
- obtained based on D, what is the accuracy of h in future examples?
- 2. What is the error in this (accuracy) estimate?

Example

Classification of mushrooms.

Some mushrooms are more likely to show up than others.

Example:

More likely to appear

frequency

size

Sample Error and True Error

Sample error:

errorS (h) = 1/n Σ δ (f(X),h(X))

Where f is the true target function, h is the hypothesis, and

δ(a,b) = 1 if a = b, 0 otherwise.

True Error:

errorD (h) = P[ f(X) = h(x)]D

How good is errorS (h) in estimating errorD (h) ?

Example

There are 4 mushrooms in our dataset: {X1, X2, X3, X4}

out of a space of 6 possible mushrooms.

The probability distribution is such that

P(X1) = 0.2 P(X4) = 0.1

P(X2) = 0.1 P(X5) = 0.2

P(X3) = 0.3 P(X6) = 0.1

Our hypothesis classifies correctly X1, X2, and X3 but not X4.

The sample error is ¼ (0 + 0 + 0 + 1) = ¼ = 0.25

Our hypothesis also classifies correctly X6 but not X5.

The true error is

0.2(0) + 0.1(0) + 0.3(0) + 0.1(1) + 0.2(1) + 0.1(1) = 0.3

Evaluating Hypothesis

- Introduction
- Estimating Accuracy
- Sampling Theory and Confidence Intervals
- Differences in Error
- Comparing Learning Algorithms

Confidence Intervals on Sample Error

- Assume the following conditions are present:
- The sample has n examples drawn according to probability D.
- n > 30
- Hypothesis h has made r errors in the n examples.
- Then with probability of 95%, the true error lies in the interval:
- errorS(h) +- 1.96 errorS(h) (1 - errorS(h)) / n

For example if n = 40 and r =12 then with 95% confidence the

interval lies in 0.30 +- 0.14

Sampling Error and the Binomial Distribution

How much does the size of the dataset affect the difference between

the sample error and the true error?

We have a sample of size n obtained according to distribution D.

Instances are drawn independently from each other.

This probability can be modeled through the binomial distribution

Combinations

- Combinations:
- Assume we wish to select r objects from n objects.
- In this case we do not care about the order in which
we select the r objects.

- The number of possible combinations of r objects
from n objects is

n ( n-1) (n-2) … (n –r +1) / r! = n! / (n-r)! r!

- We denote this number as C(n,r)

The Mean

- Let X be a discrete random variable that takes the
following values:

x1, x2, x3, …, xn. Let P(x1), P(x2), P(x3),…,P(xn) be

their respective probabilities. Then the expected value of X,

E(X), is defined as

E(X) = x1P(x1) + x2P(x2) + x3P(x3) + … + xnP(xn)

E(X) = Σi xi P(xi)

The Variance

- Let X be a discrete random variable that takes the
following values:

x1, x2, x3, …, xn. Let u be the mean. Then the variance is:

variance(X) = E[ X – u] 2

Binomial Distribution

- What is the probability of getting x successes in n trials?
- Assumption: all trials are independent and the probability of
success remains the same.

Let p be the probability of success and let q = 1-p

then the binomial distribution is defined as

P(x) = nCx p x q n-x for x = 0,1,2,…,n

The mean equals n p

The variance equals n p q

The Sampling Error

The sampling error can be modeled using a binomial distribution.

For example. Let’s suppose we have a dataset of size n = 40 and

that the true probability of error is 0.3.

Then the expected number of errors is np = 40(0.3) = 12

Plotting the binomial distribution:

0 10 20 30 40

Bias and Variance

If r is the number of errors in our dataset of size n, then

our estimation of the sample error is r/n.

The true error is p.

The bias of an estimator Y is E[Y] – p.

The sample error is an unbiased estimator of the true error

because the expected value of r/n is p.

The standard deviation is approx.

errorS(h) (1 - errorS(h)) / n

- Introduction
- Estimating Accuracy
- Sampling Theory and Confidence Intervals
- Differences in Error
- Comparing Learning Algorithms

Differences in Error

Suppose we are comparing two hypothesis from two

algorithms, say a decision tree and a neural network:

h1

h2

Type B

Type A

Differences in Error

We would like to compute the difference in error

d = error(h1) – error(h2).

Variable d can be considered approx. normal.

(because the difference of two normal distributions is also normal).

The variance is the sum of the variances of both errors.

So the interval confidence is defined as:

d +- Zn error(h1)(1 – error(h1)) / n1 + error(h2)(1 – error(h2)) / n2

- Introduction
- Estimating Accuracy
- Sampling Theory and Confidence Intervals
- Differences in Error
- Comparing Learning Algorithms

Comparing Learning Algorithms

We have two algorithms LA and LB.

How do we know which is one is better on average for

learning some target function?

We want to compute the expected value of their different

performance according to distribution D:

E[ error(LA) – error(LB) ] D

Comparing Learning Algorithms

With a limited sample what we can do is the following:

For i = 1 to k

Separate the data into a training set and a testing set

Train LA and LB on the testing set

Compute the difference in error:

di = error(LA) – error(LB)

End

Return dmean = 1/k Σ di

Comparing Learning Algorithms

The standard deviation for this statistic is:

standard dev = [1 / k(k-1)] Σ (di – dmean)2

So the confidence intervals are:

dmean +- t(n,k-1) (standard dev)

Summary and Conclusions

- Estimating the accuracy of a hypothesis may have some error.
- Errors in estimation are the bias and variance factors.
- One can compute confidence intervals using statistical theory.
- The sampling error can be modeled using a Binomial distribution.
- Differences in error can be computed using multiple subsampling.

Download Presentation

Connecting to Server..