Computational learning theory
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Computational Learning Theory PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

Computational Learning Theory. Introduction The PAC Learning Framework Finite Hypothesis Spaces Examples of PAC Learnable Concepts. Introduction. Computational learning theory (or CLT): Provides a theoretical analysis of learning

Download Presentation

Computational Learning Theory

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Computational learning theory

Computational Learning Theory

  • Introduction

  • The PAC Learning Framework

  • Finite Hypothesis Spaces

  • Examples of PAC Learnable Concepts


Introduction

Introduction

  • Computational learning theory (or CLT):

  • Provides a theoretical analysis of learning

  • Shows when a learning algorithm can be expected to succeed

  • Shows when learning may be impossible


Introduction1

Introduction

There are typically three areas comprised by CLT:

Sample Complexity. How many examples we need

to find a good hypothesis?

Computational Complexity. How much computational

power we need to find a good hypothesis?

Mistake Bound. How many mistakes we will make

before finding a good hypothesis?


Computational learning theory1

Computational Learning Theory

  • Introduction

  • The PAC Learning Framework

  • Finite Hypothesis Spaces

  • Examples of PAC Learnable Concepts


The pac learning framework

The PAC Learning Framework

Let’s start with a simple problem:

Assume a two dimensional space with positive and

negative examples. Our goal is to find a rectangle that

includes the positive examples but not the negatives

(input space is R2):

true concept

-

-

+

-

+

-

+

+

-

+

-

+

-

-

-


Definitions

Definitions

  • Definitions:

  • DistributionD. Assume instances are generated at random

  • from a distribution D.

  • Class of ConceptsC. Let C be a class of concepts that we

  • wish to learn. In our example C is the family of all rectangles

  • in R2.

  • Class of HypothesesH. The hypotheses our algorithm

  • considers while learning the target concept.

  • True error of a hypothesish

    • errorD(h) = Pr[c(x) = h(x)]D


True error

True Error

The true error:

true concept c

Region A

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h

-

-

-

Region B

True error is the probability of regions A and B.

Region A : false negatives

Region B : false positives


Considerations

Considerations

  • Two considerations:

  • We don’t want a hypothesis or approximation with

  • zero error. There might be some error as long as it

  • is small (bounded by a constant ε).

  • We don’t need to always produce a good hypothesis

  • for every sample of examples (the probability of failure

  • will be bounded by a constant δ).


The pac learning framework1

The PAC Learning Framework

PAC learning stands for probably approximate correct.

Roughly, it tells us a class of concepts C (defined over an

input space with examples of size n) is PAC learnable by

a learning algorithm L, if for arbitrary small δ and ε, and

for all concepts c in C, and for all distributions D over the

input space, there is a 1-δ probability that the hypothesis h

selected from space H by learning algorithm L is

approximately correct (has error less than ε). L must run

in time polynomial in 1/ ε , 1/ δ, n, and the size of c.


Example

Example

true concept c

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h (most specific)

-

-

-

Assume a learning algorithm that outputs the rectangle

that is most specific (touches the positive examples at the

border of the rectangle).

Question: Is this class of problems (rectangles in R2) PAC

learnable by L?


Example1

Example

error

true concept c

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h (most specific)

-

-

-

The error is the probability of the area between h and the

true target rectangle c. How many example do we need to

make this error less than ε?


Example2

Example

error

true concept c

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h (most specific)

-

-

-

In general, the probability that m independent examples

have NOT fallen within the error region is (1- ε) mwhich we

want to be less than δ.


Example3

Example

In other words, we want:

(1- ε) m <= δ

Since (1-x) <= e–xwe have that

e –εm <= δ or

m >= (1/ ε) ln (1/ δ)

The result grows linearly in 1/ ε andlogarithmically 1/ δ


Example4

Example

error

true concept c

-

-

+

-

+

-

+

+

-

+

-

+

hypothesis h (most specific)

-

-

-

The analysis above can be applied by considering each of the

four stripes of the error region. Can you finish the analysis

considering each stripe separately and then joining their

probabilities?


Computational learning theory2

Computational Learning Theory

  • Introduction

  • The PAC Learning Framework

  • Finite Hypothesis Spaces

  • Examples of PAC Learnable Concepts


Sample complexity for finite hypothesis spaces

Sample Complexity for Finite Hypothesis Spaces

Definition:

A consistent learner outputs the hypothesis h in H that

perfectly fits the training examples (if possible).

How many examples do we need to be approximately

correct in finding a hypothesis output by a consistent

learner that has low error?


Version space exhausted

Version Space ε-exhausted

This is the same as asking how many examples we need to

make the Version Space contain no hypothesis with error

greater than ε.

When a Version Space VS is such that no hypothesis has error

greater than ε, we say the version space is ε-exhausted.

How many examples do we need to make a version space VS

be ε-exhausted?


Probability of version space being exhausted

Probability of Version Space being ε-exhausted

The probability that the version space is not ε-exhausted

after seeing m examples is the same as asking the probability

than no hypothesis in VS has error greater than ε. Since

the size of the VS is less than the size of the whole hypothesis

space H, then that probability is clearly less than

|H|e –εm

If we make this less than δ , then we have that

m >= 1/ ε (ln |H| + ln (1/ δ ))


Agnostic learning

Agnostic Learning

What happens if our hypothesis space H does not

contain the target concept c?;

Then clearly we can never find a hypothesis h with

zero error.

Here we want an algorithm that simply outputs

the hypothesis with minimum training error.


Computational learning theory3

Computational Learning Theory

  • Introduction

  • The PAC Learning Framework

  • Finite Hypothesis Spaces

  • Examples of PAC Learnable Concepts


Examples of pac learnable concepts

Examples of PAC-Learnable Concepts

Can we say that concepts described by conjunctions of

Boolean literals are PAC learnable?

First, how large is the hypothesis space when we have n

Boolean attributes?

Answer: |H| = 3n


Examples of pac learnable concepts1

Examples of PAC-Learnable Concepts

If we substitute this in our analysis of sample complexity

for finite hypothesis spaces we have that:

m >= 1/ ε (n ln 3 + ln (1/ δ) )

Thus the set of conjunctions of Boolean literals is

PAC learnable.


K term dnf not pac learnable

K-Term DNF not PAC Learnable

Consider now the class of functions of k-term DNF expressions.

These are expressions of the form

T1 V T2 V … V Tk where V stands for disjunction and

each term Ti is a conjunction of n Boolean attributes.


K term dnf not pac learnable1

K-Term DNF not PAC Learnable

The size of |H| is k3n Using the equation for the sample

complexity of finite hypothesis spaces:

m >= 1/ ε (n ln 3 + ln (1/ δ) + ln k)

Although the sample complexity is polynomial in the

main parameters the problem is known to be NP complete.


K term cnf is pac learnable

K-Term CNF is PAC Learnable

But it is interesting to see that another family of

functions, the class of k-term CNF expressions is

PAC learnable.

This is interesting because the class of k-term CNF

expressions is strictly larger than the class of k-term

DNF expressions.


  • Login