- 269 Views
- Uploaded on
- Presentation posted in: General

Concept Learning and Version Spaces

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Concept Learning and Version Spaces

Based Ch.2 of Tom Mitchell’s Machine Learning and lecture slides by Uffe Kjaerulff

- Concept learning as boolean function approximation
- Ordering of hypothesis
- Version spaces and candidate-elimination algorithm
- The role of bias

- Inferring boolean-valued functions from training examples;
- Inductive learning.

- Example
- Given:
- Instances X: Possible days described by Sky, AirTemp, Humidity, Wind, Water, Forecast;
- Target concept c: Enjoy-Sport: Dayt {Yes,No};
- Hypothesis H: described by a conjunction of attributes,
- e.g. Water=Warm Sky=Sunny;

- Training examples D: positive and negative examples of target function,
- <x1, c(x1),…, xm, c(xm)>.

- Determine:
- A hypothesis h from H such that h(x)=c(x) for all x in X.

- Note: the only information available about c is c(x) for each <x, c(x)> in D.
- Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other observed example.

- Some notation for hypothesis representation:
- “?” means that any value is acceptable as an attribute;
- “0” means that no value is acceptable.

- In our example
- Sky {Sunny, Cloudy, Rainy};
- AirTemp {Warm, Cold};
- Humidity {Normal, High};
- Wind {Strong, Weak};
- Water {Warm, Cold};
- Forecast {Same, Change}.

- The instance space contains 3*2*2*2*2*2=96 distinct instances.
- The hypothesis space contains 5*4*4*4*4*4=5120 syntactically distinct hypothesis
- More realistic learning tasks contain much larger H.
- Efficient strategies are crucial.

- Let hj and hk be boolean functions over X, then More-General-Than-Or-Equal(hj,hk)(x X) [hk(x) hj(x)]
- Establishes partial order on the hypothesis space.

- Initialize h to the most specific hypothesis in H;
- For each positive training instance x
- For each attribute ai in h
- If the constraint aiin h is not satisfied by x then replace aiin h by the most general constraint that is satisfied by x

- For each attribute ai in h
- Output hypothesis h.
- Note: Assume that H contains c and that D contains no errors;
- Otherwise this technique does not work.

- Limitations:
- Can’t tell if it’s learned the concept:
- Other consistent hypothesis?

- Fails if training data is inconsistent;
- Picks maximally specific h;
- Depending on H there might be several.

- Can’t tell if it’s learned the concept:

- A hypothesis h is consistent with a set of training examples D of target concept if and only if h(x)=c(x) for each training example <x, c(x)> in D:
- Consistent(h,D) ( <x, c(x)> D) [ h(x) = c(x) ]

- A version space VSH,D wrt H and D is the subset of hypothesis from H consistent with all training examples in D:
- VSH,D { h H: Consistent(h, D) }

- VersionSpace a list containing every hypothesis in H;
- For each training example <x, c(x)> in D
- Remove from VersionSpace any h for which h(x)c(x)

- Output the list of hypothesis.
- Maintains a list of all hypothesis in VSH,D.
- Unrealistic for most H.
- More compact (regular) representation of VSH,D is needed.

- Idea: VSH,D can be represented by the set of most general and most specific consistent hypothesis.

- The general boundary G of version space VSH,D is the set of its most general members.
- The specific boundary S of version space VSH,D is the set of its most specific members.
- Version Space Representation Theorem
- Let X be an arbitrary set of instances and let H be a set of boolean-valued hypothesis defined over X. Let c: X {0,1} be an arbitrary target concept defined over X, and let D be an arbitrary set of training examples {<x, c(x)>}. For all X, H, c, and D such that S and G are well defined VSH,D { h H s S g G g h s }.

- G maximally general hypothesis in H
- S maximally specific hypothesis in H
- For each training example d
- If d is a positive example
- Remove from G any hypothesis that does not cover d
- For each hypothesis s in S that does not cover d
- Remove s from S
- Add to S all minimal generalizations h of s such that h covers d and some member of G is more general than h
- Remove from S any hypothesis that is more general than another hypothesis in S

- If d is a negative example
- Remove from S any hypothesis that covers d
- For each hypothesis g in G that covers d
- Remove g from G
- Add to G all minimal specializations h of g such that h does not cover d and some member of S is more specific than h
- Remove from G any hypothesis that is more specific than another hypothesis in G

- If d is a positive example

- Positive examples make S become increasingly general.
- Negative examples make G become increasingly specific.
- Candidate-Elimination algorithm will converge toward the hypothesis that correctly describes the target concept provided that
- There are no errors in the training example;
- There is some hypothesis in H that correctly describes the target concept.

- The target concept is exactly learned when the S and G boundary sets converge to a single identical hypothesis.
- Under the above assumptions, new training data can be used to resolve ambiguity.
- The algorithm beaks down if
- the data is noisy(inconsistent);
- Inconsistency can be eventually detected given sufficient training data is given: S and G converge to an empty version space.

- The target concept is a disjunction of feature attributes.

- the data is noisy(inconsistent);

- Bias: Each h H given by a conjunction of attribute values
- Unable to represent disjunctive concepts:
- Sky=Sunny Sky=Cloudy

- Most specific hypothesis consistent with 1 and 2 and representable in H is (?,Warm, Normal, Strong, Cool, Change).
- But it is too general:
- Covers 3.

- Idea: Choose H that expresses every teachable concept;
- H is is a power set of X;
- Allow disjunction and negation.
- For our example we get 296 possible hypothesis.

- What is G and S?
- S becomes a disjunction of positive examples;
- G becomes a negated disjunction of negative examples.

- Only training examples will be unambiguously classified.
- The algorithm cannot generalize!

- Let
- L be a concept learning algorithm;
- X be a set instances;
- c be the target concept;
- Dc={<x, c(x)>} be the set of training examples;
- L(xi,Dc) denote the classification assigned to the instance xi by L after training on Dc.

- The inductive bias of L is any minimal set of assertions B such that for the target concept c and corresponding training examples Dc:
- xi X: (BDc xi) L(xi, Dc)

- Inductive bias of Candidate-Elimination algorithm:
- The target concept c is contained in the given hypothesis space H.

- Concept learning as search through H
- Partial ordering of H
- Version space candidate elimination algorithm
- S and G characterize learner’s uncertainty
- Inductive leaps are possible only if the learner is biased