1 / 25

# Evaluating Hypothesis - PowerPoint PPT Presentation

Evaluating Hypothesis. Introduction Estimating Accuracy Sampling Theory and Confidence Intervals Differences in Error Comparing Learning Algorithms. Introduction. Given some training data a learning algorithm produces a hypothesis.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Evaluating Hypothesis' - sylvester-potts

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• Introduction

• Estimating Accuracy

• Sampling Theory and Confidence Intervals

• Differences in Error

• Comparing Learning Algorithms

Given some training data a learning algorithm produces a hypothesis.

The next step is to estimate the accuracy of the hypothesis on

testing data:

Data Learning Hypothesis

Algorithm

Performance Assessment

• How do we know how precise is our estimation?

• There are two difficulties:

• Bias in the estimate

• Variance in the estimate

• Bias in the estimate. Normally overoptimisitc.

• To avoid it we use a separate set of data.

• Variance in the estimate. The estimation varies from

• sample to sample. The smaller the sample the larger the variance.

Estimated Accuracy

Variance

accuracy

True accuracy

Bias

sample size

• Introduction

• Estimating Accuracy

• Sampling Theory and Confidence Intervals

• Differences in Error

• Comparing Learning Algorithms

Examples in the input space are randomly distributed according

to some probability distribution D.

p(X)

Input Space X

• Questions:

• Given a hypothesis h and a dataset with n examples randomly

• obtained based on D, what is the accuracy of h in future examples?

• 2. What is the error in this (accuracy) estimate?

Classification of mushrooms.

Some mushrooms are more likely to show up than others.

Example:

More likely to appear

frequency

size

Sample error:

errorS (h) = 1/n Σ δ (f(X),h(X))

Where f is the true target function, h is the hypothesis, and

δ(a,b) = 1 if a = b, 0 otherwise.

True Error:

errorD (h) = P[ f(X) = h(x)]D

How good is errorS (h) in estimating errorD (h) ?

There are 4 mushrooms in our dataset: {X1, X2, X3, X4}

out of a space of 6 possible mushrooms.

The probability distribution is such that

P(X1) = 0.2 P(X4) = 0.1

P(X2) = 0.1 P(X5) = 0.2

P(X3) = 0.3 P(X6) = 0.1

Our hypothesis classifies correctly X1, X2, and X3 but not X4.

The sample error is ¼ (0 + 0 + 0 + 1) = ¼ = 0.25

Our hypothesis also classifies correctly X6 but not X5.

The true error is

0.2(0) + 0.1(0) + 0.3(0) + 0.1(1) + 0.2(1) + 0.1(1) = 0.3

• Introduction

• Estimating Accuracy

• Sampling Theory and Confidence Intervals

• Differences in Error

• Comparing Learning Algorithms

• Assume the following conditions are present:

• The sample has n examples drawn according to probability D.

• n > 30

• Hypothesis h has made r errors in the n examples.

• Then with probability of 95%, the true error lies in the interval:

• errorS(h) +- 1.96 errorS(h) (1 - errorS(h)) / n

For example if n = 40 and r =12 then with 95% confidence the

interval lies in 0.30 +- 0.14

How much does the size of the dataset affect the difference between

the sample error and the true error?

We have a sample of size n obtained according to distribution D.

Instances are drawn independently from each other.

This probability can be modeled through the binomial distribution

• Combinations:

• Assume we wish to select r objects from n objects.

• In this case we do not care about the order in which

we select the r objects.

• The number of possible combinations of r objects

from n objects is

n ( n-1) (n-2) … (n –r +1) / r! = n! / (n-r)! r!

• We denote this number as C(n,r)

• Let X be a discrete random variable that takes the

following values:

x1, x2, x3, …, xn. Let P(x1), P(x2), P(x3),…,P(xn) be

their respective probabilities. Then the expected value of X,

E(X), is defined as

E(X) = x1P(x1) + x2P(x2) + x3P(x3) + … + xnP(xn)

E(X) = Σi xi P(xi)

• Let X be a discrete random variable that takes the

following values:

x1, x2, x3, …, xn. Let u be the mean. Then the variance is:

variance(X) = E[ X – u] 2

• What is the probability of getting x successes in n trials?

• Assumption: all trials are independent and the probability of

success remains the same.

Let p be the probability of success and let q = 1-p

then the binomial distribution is defined as

P(x) = nCx p x q n-x for x = 0,1,2,…,n

The mean equals n p

The variance equals n p q

The sampling error can be modeled using a binomial distribution.

For example. Let’s suppose we have a dataset of size n = 40 and

that the true probability of error is 0.3.

Then the expected number of errors is np = 40(0.3) = 12

Plotting the binomial distribution:

0 10 20 30 40

If r is the number of errors in our dataset of size n, then

our estimation of the sample error is r/n.

The true error is p.

The bias of an estimator Y is E[Y] – p.

The sample error is an unbiased estimator of the true error

because the expected value of r/n is p.

The standard deviation is approx.

errorS(h) (1 - errorS(h)) / n

• Introduction

• Estimating Accuracy

• Sampling Theory and Confidence Intervals

• Differences in Error

• Comparing Learning Algorithms

Suppose we are comparing two hypothesis from two

algorithms, say a decision tree and a neural network:

h1

h2

Type B

Type A

We would like to compute the difference in error

d = error(h1) – error(h2).

Variable d can be considered approx. normal.

(because the difference of two normal distributions is also normal).

The variance is the sum of the variances of both errors.

So the interval confidence is defined as:

d +- Zn error(h1)(1 – error(h1)) / n1 + error(h2)(1 – error(h2)) / n2

• Introduction

• Estimating Accuracy

• Sampling Theory and Confidence Intervals

• Differences in Error

• Comparing Learning Algorithms

We have two algorithms LA and LB.

How do we know which is one is better on average for

learning some target function?

We want to compute the expected value of their different

performance according to distribution D:

E[ error(LA) – error(LB) ] D

With a limited sample what we can do is the following:

For i = 1 to k

Separate the data into a training set and a testing set

Train LA and LB on the testing set

Compute the difference in error:

di = error(LA) – error(LB)

End

Return dmean = 1/k Σ di

The standard deviation for this statistic is:

standard dev = [1 / k(k-1)] Σ (di – dmean)2

So the confidence intervals are:

dmean +- t(n,k-1) (standard dev)

• Estimating the accuracy of a hypothesis may have some error.

• Errors in estimation are the bias and variance factors.

• One can compute confidence intervals using statistical theory.

• The sampling error can be modeled using a Binomial distribution.

• Differences in error can be computed using multiple subsampling.