机器学习

机器学习 陈昱北京大学计算机科学技术研究所信息安全工程研究中心

课程基本信息 • 主讲教师：陈昱 chen_yu@pku.edu.cn Tel：82529680 • 助教：程再兴，Tel：62763742 wataloo@hotmail.com • 课程网页： http://www.icst.pku.edu.cn/course/jiqixuexi/jqxx2011.mht

Ch2 Concept Learning & General-to-Specific Ordering • Introduction to concept learning • Concept learning as search • FIND-S algorithm • Version space and CANDIDATE-ELIMINATION algorithm • Inductive bias

Types of learning • Based on types of feedback • Supervised learning: correct answer for each training example (labeled example) • Un-supervised learning: answer not given (unlabeled example) • Mixture of labeled and unlabeled examples: semi-supervised learning • Reinforcement learning: the teacher provides reward or penalty.

Concept Learning & General-to-Specific Ordering • Introduction to concept learning • Concept learning as search • FIND-S algorithm • Version space and CANDIDATE-ELIMINATION algorithm • Inductive bias

Definition & Example • Def. Concept learning is the task of inferring a boolean-valued function from labeled training examples • Example: learning the concept “days on which my friend Aldo enjoys his favorite water sport” from a set of training examples:

Example (contd) Representing hypotheses • One way is to represent a hypo as conjunction of constraints on attributes. Each constraint can be • A specific value (e.g. Water=Warm) • Don’t care (e.g. Water=?) • No value allowed (e.g. Water=Ø) • An example of hypo in EnjoySport: <Sunny ? ? Strong ? Same>

Example (contd) • Most general hypo—every day is a positive example—is represented by <? ? ? ? ? ?> • Most specific hypo—every day is a negative example—is represented by <some attribute=Ø>

Prototypical Concept Learning Task • Given: • Instance space X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. • Sky (Sunny, Cloudy, Rainy); AirTemp (Warm, Cold); Huminity (Normal, High); Wind (Strong, Weak); Water (Warm, Cool); Forecast (Same, Change). • Target function EnjoySport, c: X→{0,1} • Hypo space H: conjunction of literals • Set D of training examples: positive or negative examples <x, c(x)> of target function • Determine: a hypo h in H s.t. h(x)=c(x) for all x in D (a kind of inductive learning)

Inductive Learning: A Brief Overview • Simplest form: learn a function from examples Let f be the target function, then an exampleis a pair (x, f(x)) • Statement of a inductive-learning problem: Given a collection of examples of f , return a function h that approximates f (h is called a hypothesis). • The fundamental problem of induction is the prediction power of learned h

Philosophical Foundation • One motivation behind inductive learning is an attempt to establish the source of knowledge • Aristotle (384-322 B.C.) was the first to formulate a precise set of laws governing the rational part of the mind • The empiricism movement, starting with Francis Bacon’s (1561-1626) Novum Organum (“new instrument” in English) , is characterized by a dictum of John Locke (1632-1704): “Nothing is in the understanding, which is not the first in the senses”.

An Example: Curve Fitting • Examples (x, f(x)) and a consistent linear hypothesis • A consistent degree-7 polynomial for the same data set • A different data set that admits an exact degree-6 polynomial fit or an approximate linear fit • A simple, exact sinusoidal fit to the same data set in c) • A learning problem is realizable if the hypothesis space contains the true function

Ockham’s razor • Q: How do we choose from among multiple consistent hypotheses? • Ockham’s razor: Prefer the simplest hypothesis consistent with the data—”Entities are not to be multiplied beyond necessity” William of Ockham (1280-1349), the most influential philosopher of his century.

Inductive Learning Hypothesis • There is a fundamental assumption underlying the learned hypo, so-called inductive learning hypothesis: Any hypo found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples

An Example: EnjoySport • EnjoySport: • Instance space X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. • Sky (Sunny, Cloudy, Rainy); AirTemp (Warm, Cold); Huminity (Normal, High); Wind (Strong, Weak); Water (Warm, Cool); Forecast (Same, Change). • Target function EnjoySport, c: X→{0,1} • Hypo space H: conjunction of literals • Size of its instance space: 3×2×2×2×2×2=96 • Size of its hypo space: 4×3×3×3×3×3+1=973 • Q: does there exist a way to search the hypo space?

General-to-Specific Ordering of Hypo • An illustration:

“More General Than” Relationship • Def. Let hj and hk be boolean-valued functions defined over X, then hj is more_general_than_or_equal_to hk (written as hj≥ghk iff • Note: “≥g”is independent of target concept • Property: “≥g” is a partial order.

Find-S Algorithm Finding-S: Find a maximally specific hypothesis • Initialize h to the most specific hypothesis in H • For each positive training example x • For each attribute constraint aiin h, if it is satisfied by x, then do nothing; otherwise replace ai by the next more general constraint that is satisfied by x. • Output hypo h

An Illustration of Find-S • Note: If we assume the target concept c is in H, and training examples are noise-free, then the h found via Find-S must also be consistent with c on negative training examples.

Complaints about Find-S • Has the learned h converged to the true target concept? Not sure! • Why prefer the most specific hypothesis? • Are the training examples consistent? • We would prefer an algorithm that might be able to detect when training examples are inconsistent, or even better, be able to correct the error. • What if there are several maximally specific consistent hypothehses?

Version Space • Version space is the set of hypotheses that are consistent with the training data, i.e.

List-Then-Eliminate Algorithm • A “brute force” way of computing version space: list-then-eliminate Algorithm • Initialize VS by H • For each training example <x, c(x)>, eliminate any h in VS that is not consistent with c on x. • Output the resulting VS.

Version Space with Boundary Sets • Need a more compact representation of VS to efficiently compute version space • One approach: Delimit VS by general and specific boundary sets and partial order between the hypotheses. • Example: VS of EnjoySport has six elements which might be ordered in the following way:

VS Representation Theorem • Def. The general boundary G w.r.t. hypo space H and training data D, is the set of maximally general members of H consistent with D. • Def. The specific boundary S w.r.t. hypo space H and training data D, is the set of minimally general (i.e. maximally specific) members of H consistent with D.

VS Representation Theorem (2) • Let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let c be an arbitrary boolean-valued target concept over X, and let D be training set of <x, c(x)>. For all X, H, c, and D s.t. S & G are well-defined,

CANDIDATE-ELIMINATION Algorithm • Initialize G to set of maximally general hypotheses in H • Initialize S to set of maximally specific hypotheses in H • For each training example d, do • If d is a positive example, • Remove from G any hypo inconsistent with d • For each hypo s in S that is inconsistent with d • Remove s from S • Add to S all minimal generalizations h of s s.t. h is consistent with d, and some member of G is more general than h • Remove from S any hypo that is more general than another hypo in S

Contd • If d is a negative example • Remove from S any hypo inconsistent with d • For each hypo g in G that is inconsistent with d • Remove g from G • Add to G all minimal specifications h of g s.t. h is consistent with d, and some member of S is more specific than h • Remove from G any hypo that is more specific than another hypo in G

An Illustrative Example • Find VS of EnjoySport via Candidate-Elimination Algorithm

An Illustrative Example (2)

An Illustrative Example (5) • Final VS learned from those 4 examples:

Remarks • CANDIDATE-ELIMINATION works when the conditions in “version space representation theorem” holds, however, in case that • every instance can be represented as a fixed-length attribute vector with each attribute taking a finite number of possible values, and • the hypo space is restricted to conjunctions of constraints on attributes as defined early, • then operations on S in the algorithm can be simplified to FIND-S (during the process S always be a single-element set)

Remarks (2) • Will the algorithm converge to the correct hypo? Converges if no error in training examples and the true target concept is in H. • What if some training example contains wrong target value? The true target concept won’t be in VS • What if the true target concept is not in H? The VS might be empty

Remarks (3) • What training examples should the learner request next? Consider the case that learner proposes the next instance, and obtain answer from teacher. E.g. • What query should be presented next? • One such instance is <Sunny, Warm, ?, Light, ?, ?>. In general, try generating queries that satisfy exactly half of the hypotheses.

Remarks (4) • How can partially learned concept be used? Consider the VS learned in previous page. Suppose no more training examples, and the learner is required to classify a new instance not yet observed during training. Look at the following 4 examples: • <Sunny Warm Normal Strong Cool Change> • <Rainy Cold Normal Light Warm Same> • <Sunny Warm Normal Light Warm Same> • <Sunny Cold Normal Strong Warm Same> • Assume the target concept is in VS, then labels of above 4 examples are (utilizing the partial order): • ex 1 as “+”; ex 2 as “-”; ex 3 & 4 are ambiguous, and might be assigned a value by voting.

A Biased Hypo Space • Consider EnjoySport: If we restrict H to conjunctions of attributes, then it is unable to represent even a simple disjunctive concept such as “Sky=Sunny or Cloud”. • E.g. given the following three training examples: • <Sunny Warm Normal Strong Cool Change Yes> • < Cloudy Warm Normal Strong Cool Change Yes> • <Rainy Warm Normal Strong Cool Change No> • Candidate-Elimination algorithm (actually any algorithm) will output empty VS.

An Unbiased Learner • One obvious approach for an unbiased hypo space is to alternatively propose a hypo space H’ capable of representing every teachable concept over X, i.e. power set of X • Consider a couple numbers in EnjoySport: \X|=96, number of conjunctive hypotheses equal to 973 (vs. 296) • Apply CANDIDATE-ELIMINATION algorithm to H’ and training set D, then learning algorithm completely loses its generalization power: Every new instance unseen in D will be classified ambiguously!

Futility of Bias-Free Learning • Fundamental Property of Inductive Inference: A learner that makes no a priori assumption (i.e. inductive bias) regarding the identity of target concept has no rational basis for classifying unseen instances. • An interesting idea: characterize various learning approaches by the inductive bias they employ. However,we need to define inductive bias more precisely first.

Inductive Bias • Let L(xi, Dc) denote the classification L assigned to xi after learning from training set Dc. We describe inductive inference step performed by L as follows: • What additional assumptions could be added to Dc∧ xi s.t. L(xi, Dc) would follow deductively? Thus we define inductive bias of L as this set of additional assumptions.

Inductive Bias (2) • Def. Inductive bias of L is any minimal set of assertions B s.t. for any target concept c and training example Dcwe have where “y\-z” indicates z follows deductively from y. • If we define L(xi, Dc) as the unanimous votes by elements of VS found (undermined if not unanimously), then inductive bias of CANDIDATE-ELIMINATION algorithm is “target concept c is in H”

Inductive Bias of Various Learners • Rote-learner: learning by simply storing training examples in memory No inductive bias • CANDIDATE-ELIMINATION: New instances are classified only in case that all members in VS make the same decision Inductive bias: target concept is in VS • FIND-S: has an even stronger inductive bias than CANDIDATE-ELIMINATION

Inductive→Deductive

Summary • Concept learning as search through H • General-to-specific ordering over H • Candidate-Elimination algorithm • Learner can make useful queries • Inductive leaps possible only if learner is biased

More on Concept Learning • Bruner et al. (1957) did a pioneering study of concept learning in human being. Concept learning, also known as category learning or concept attainment, was defined in the book as “the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories”. • Simply put, concepts are the mental categories that help us classify objects, events, or ideas, and each object, event, or idea has a set of common relevant features. (Wikipedia)

On Bruner et al.’s book • Editorial Reviews (1986 ed.): “A Study of Thinking” is a pioneering account of how human beings achieve a measure of rationality in spite of the constraints imposed by bias, limited attention and memory, and the risks of error imposed by pressures of time and ignorance. First published in 1956 and hailed at its appearance as a groundbreaking study, it is still read three decades later as a major contribution to our understanding of the mind. In their insightful new introduction, the authors relate the book to the cognitive revolution and its handmaiden, artificial intelligence.

机器学习

机器学习

Presentation Transcript