# Concept Learning and Version Spaces - PowerPoint PPT Presentation

1 / 17

Concept Learning and Version Spaces. Based Ch.2 of Tom Mitchell’s Machine Learning and lecture slides by Uffe Kjaerulff. Presentation Overview. Concept learning as boolean function approximation Ordering of hypothesis Version spaces and candidate-elimination algorithm The role of bias.

## Related searches for Concept Learning and Version Spaces

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Concept Learning and Version Spaces

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Concept Learning and Version Spaces

Based Ch.2 of Tom Mitchell’s Machine Learning and lecture slides by Uffe Kjaerulff

### Presentation Overview

• Concept learning as boolean function approximation

• Ordering of hypothesis

• Version spaces and candidate-elimination algorithm

• The role of bias

• Inferring boolean-valued functions from training examples;

• Inductive learning.

• Example

• Given:

• Instances X: Possible days described by Sky, AirTemp, Humidity, Wind, Water, Forecast;

• Target concept c: Enjoy-Sport: Dayt {Yes,No};

• Hypothesis H: described by a conjunction of attributes,

• e.g. Water=Warm  Sky=Sunny;

• Training examples D: positive and negative examples of target function,

• <x1, c(x1),…, xm, c(xm)>.

• Determine:

• A hypothesis h from H such that h(x)=c(x) for all x in X.

### The Inductive Learning Hypothesis

• Note: the only information available about c is c(x) for each <x, c(x)> in D.

• Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other observed example.

### Concept Learning as Search

• Some notation for hypothesis representation:

• “?” means that any value is acceptable as an attribute;

• “0” means that no value is acceptable.

• In our example

• Sky  {Sunny, Cloudy, Rainy};

• AirTemp  {Warm, Cold};

• Humidity  {Normal, High};

• Wind  {Strong, Weak};

• Water  {Warm, Cold};

• Forecast  {Same, Change}.

• The instance space contains 3*2*2*2*2*2=96 distinct instances.

• The hypothesis space contains 5*4*4*4*4*4=5120 syntactically distinct hypothesis

• More realistic learning tasks contain much larger H.

• Efficient strategies are crucial.

### More-General-Than

• Let hj and hk be boolean functions over X, then More-General-Than-Or-Equal(hj,hk)(x  X) [hk(x)  hj(x)]

• Establishes partial order on the hypothesis space.

### Find-S Algorithm

• Initialize h to the most specific hypothesis in H;

• For each positive training instance x

• For each attribute ai in h

• If the constraint aiin h is not satisfied by x then replace aiin h by the most general constraint that is satisfied by x

• Output hypothesis h.

• Note: Assume that H contains c and that D contains no errors;

• Otherwise this technique does not work.

• Limitations:

• Can’t tell if it’s learned the concept:

• Other consistent hypothesis?

• Fails if training data is inconsistent;

• Picks maximally specific h;

• Depending on H there might be several.

### Version Spaces

• A hypothesis h is consistent with a set of training examples D of target concept if and only if h(x)=c(x) for each training example <x, c(x)> in D:

• Consistent(h,D)  ( <x, c(x)>  D) [ h(x) = c(x) ]

• A version space VSH,D wrt H and D is the subset of hypothesis from H consistent with all training examples in D:

• VSH,D  { h  H: Consistent(h, D) }

### The List-Then-Eliminate Algorithm

• VersionSpace  a list containing every hypothesis in H;

• For each training example <x, c(x)> in D

• Remove from VersionSpace any h for which h(x)c(x)

• Output the list of hypothesis.

• Maintains a list of all hypothesis in VSH,D.

• Unrealistic for most H.

• More compact (regular) representation of VSH,D is needed.

### Example Version Space

• Idea: VSH,D can be represented by the set of most general and most specific consistent hypothesis.

### Representing Version Spaces

• The general boundary G of version space VSH,D is the set of its most general members.

• The specific boundary S of version space VSH,D is the set of its most specific members.

• Version Space Representation Theorem

• Let X be an arbitrary set of instances and let H be a set of boolean-valued hypothesis defined over X. Let c: X  {0,1} be an arbitrary target concept defined over X, and let D be an arbitrary set of training examples {<x, c(x)>}. For all X, H, c, and D such that S and G are well defined VSH,D  { h  H s S g G g  h  s }.

### Candidate-Elimination Algorithm

• G  maximally general hypothesis in H

• S  maximally specific hypothesis in H

• For each training example d

• If d is a positive example

• Remove from G any hypothesis that does not cover d

• For each hypothesis s in S that does not cover d

• Remove s from S

• Add to S all minimal generalizations h of s such that h covers d and some member of G is more general than h

• Remove from S any hypothesis that is more general than another hypothesis in S

• If d is a negative example

• Remove from S any hypothesis that covers d

• For each hypothesis g in G that covers d

• Remove g from G

• Add to G all minimal specializations h of g such that h does not cover d and some member of S is more specific than h

• Remove from G any hypothesis that is more specific than another hypothesis in G

### Some Notes on Candidate-Elimination Algorithm

• Positive examples make S become increasingly general.

• Negative examples make G become increasingly specific.

• Candidate-Elimination algorithm will converge toward the hypothesis that correctly describes the target concept provided that

• There are no errors in the training example;

• There is some hypothesis in H that correctly describes the target concept.

• The target concept is exactly learned when the S and G boundary sets converge to a single identical hypothesis.

• Under the above assumptions, new training data can be used to resolve ambiguity.

• The algorithm beaks down if

• the data is noisy(inconsistent);

• Inconsistency can be eventually detected given sufficient training data is given: S and G converge to an empty version space.

• The target concept is a disjunction of feature attributes.

### A Biased Hypothesis Space

• Bias: Each h H given by a conjunction of attribute values

• Unable to represent disjunctive concepts:

• Sky=Sunny Sky=Cloudy

• Most specific hypothesis consistent with 1 and 2 and representable in H is (?,Warm, Normal, Strong, Cool, Change).

• But it is too general:

• Covers 3.

### Unbiased Learner

• Idea: Choose H that expresses every teachable concept;

• H is is a power set of X;

• Allow disjunction and negation.

• For our example we get 296 possible hypothesis.

• What is G and S?

• S becomes a disjunction of positive examples;

• G becomes a negated disjunction of negative examples.

• Only training examples will be unambiguously classified.

• The algorithm cannot generalize!

### Inductive Bias

• Let

• L be a concept learning algorithm;

• X be a set instances;

• c be the target concept;

• Dc={<x, c(x)>} be the set of training examples;

• L(xi,Dc) denote the classification assigned to the instance xi by L after training on Dc.

• The inductive bias of L is any minimal set of assertions B such that for the target concept c and corresponding training examples Dc:

•  xi X: (BDc xi)  L(xi, Dc)

• Inductive bias of Candidate-Elimination algorithm:

• The target concept c is contained in the given hypothesis space H.

### Summary Points

• Concept learning as search through H

• Partial ordering of H

• Version space candidate elimination algorithm

• S and G characterize learner’s uncertainty

• Inductive leaps are possible only if the learner is biased