1 / 25

# Computational Learning Theory - PowerPoint PPT Presentation

Computational Learning Theory. Introduction The PAC Learning Framework Finite Hypothesis Spaces Examples of PAC Learnable Concepts. Introduction. Computational learning theory (or CLT): Provides a theoretical analysis of learning

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• Introduction

• The PAC Learning Framework

• Finite Hypothesis Spaces

• Examples of PAC Learnable Concepts

• Computational learning theory (or CLT):

• Provides a theoretical analysis of learning

• Shows when a learning algorithm can be expected to succeed

• Shows when learning may be impossible

There are typically three areas comprised by CLT:

Sample Complexity. How many examples we need

to find a good hypothesis?

Computational Complexity. How much computational

power we need to find a good hypothesis?

Mistake Bound. How many mistakes we will make

before finding a good hypothesis?

• Introduction

• The PAC Learning Framework

• Finite Hypothesis Spaces

• Examples of PAC Learnable Concepts

Assume a two dimensional space with positive and

negative examples. Our goal is to find a rectangle that

includes the positive examples but not the negatives

(input space is R2):

true concept

-

-

+

-

+

-

+

+

-

+

-

+

-

-

-

• Definitions:

• DistributionD. Assume instances are generated at random

• from a distribution D.

• Class of ConceptsC. Let C be a class of concepts that we

• wish to learn. In our example C is the family of all rectangles

• in R2.

• Class of HypothesesH. The hypotheses our algorithm

• considers while learning the target concept.

• True error of a hypothesish

• errorD(h) = Pr[c(x) = h(x)]D

The true error:

true concept c

Region A

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h

-

-

-

Region B

True error is the probability of regions A and B.

Region A : false negatives

Region B : false positives

• Two considerations:

• We don’t want a hypothesis or approximation with

• zero error. There might be some error as long as it

• is small (bounded by a constant ε).

• We don’t need to always produce a good hypothesis

• for every sample of examples (the probability of failure

• will be bounded by a constant δ).

PAC learning stands for probably approximate correct.

Roughly, it tells us a class of concepts C (defined over an

input space with examples of size n) is PAC learnable by

a learning algorithm L, if for arbitrary small δ and ε, and

for all concepts c in C, and for all distributions D over the

input space, there is a 1-δ probability that the hypothesis h

selected from space H by learning algorithm L is

approximately correct (has error less than ε). L must run

in time polynomial in 1/ ε , 1/ δ, n, and the size of c.

true concept c

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h (most specific)

-

-

-

Assume a learning algorithm that outputs the rectangle

that is most specific (touches the positive examples at the

border of the rectangle).

Question: Is this class of problems (rectangles in R2) PAC

learnable by L?

error

true concept c

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h (most specific)

-

-

-

The error is the probability of the area between h and the

true target rectangle c. How many example do we need to

make this error less than ε?

error

true concept c

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h (most specific)

-

-

-

In general, the probability that m independent examples

have NOT fallen within the error region is (1- ε) mwhich we

want to be less than δ.

In other words, we want:

(1- ε) m <= δ

Since (1-x) <= e–xwe have that

e –εm <= δ or

m >= (1/ ε) ln (1/ δ)

The result grows linearly in 1/ ε andlogarithmically 1/ δ

error

true concept c

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h (most specific)

-

-

-

The analysis above can be applied by considering each of the

four stripes of the error region. Can you finish the analysis

considering each stripe separately and then joining their

probabilities?

• Introduction

• The PAC Learning Framework

• Finite Hypothesis Spaces

• Examples of PAC Learnable Concepts

Definition:

A consistent learner outputs the hypothesis h in H that

perfectly fits the training examples (if possible).

How many examples do we need to be approximately

correct in finding a hypothesis output by a consistent

learner that has low error?

Version Space ε-exhausted

This is the same as asking how many examples we need to

make the Version Space contain no hypothesis with error

greater than ε.

When a Version Space VS is such that no hypothesis has error

greater than ε, we say the version space is ε-exhausted.

How many examples do we need to make a version space VS

be ε-exhausted?

Probability of Version Space being ε-exhausted

The probability that the version space is not ε-exhausted

after seeing m examples is the same as asking the probability

than no hypothesis in VS has error greater than ε. Since

the size of the VS is less than the size of the whole hypothesis

space H, then that probability is clearly less than

|H|e –εm

If we make this less than δ , then we have that

m >= 1/ ε (ln |H| + ln (1/ δ ))

What happens if our hypothesis space H does not

contain the target concept c?;

Then clearly we can never find a hypothesis h with

zero error.

Here we want an algorithm that simply outputs

the hypothesis with minimum training error.

• Introduction

• The PAC Learning Framework

• Finite Hypothesis Spaces

• Examples of PAC Learnable Concepts

Can we say that concepts described by conjunctions of

Boolean literals are PAC learnable?

First, how large is the hypothesis space when we have n

Boolean attributes?

If we substitute this in our analysis of sample complexity

for finite hypothesis spaces we have that:

m >= 1/ ε (n ln 3 + ln (1/ δ) )

Thus the set of conjunctions of Boolean literals is

PAC learnable.

Consider now the class of functions of k-term DNF expressions.

These are expressions of the form

T1 V T2 V … V Tk where V stands for disjunction and

each term Ti is a conjunction of n Boolean attributes.

The size of |H| is k3n Using the equation for the sample

complexity of finite hypothesis spaces:

m >= 1/ ε (n ln 3 + ln (1/ δ) + ln k)

Although the sample complexity is polynomial in the

main parameters the problem is known to be NP complete.

But it is interesting to see that another family of

functions, the class of k-term CNF expressions is

PAC learnable.

This is interesting because the class of k-term CNF

expressions is strictly larger than the class of k-term

DNF expressions.