1 / 53

机器学习

机器学习. 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息. 主讲教师:陈昱 chen_yu@pku.edu.cn Tel : 82529680 助教:程再兴, Tel : 62763742 wataloo@hotmail.com 课程网页: http://www.icst.pku.edu.cn/course/jiqixuexi/jqxx2011.mht. Ch2 Concept Learning & General-to-Specific Ordering. Introduction to concept learning

saniya
Download Presentation

机器学习

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心

  2. 课程基本信息 • 主讲教师:陈昱 chen_yu@pku.edu.cn Tel:82529680 • 助教:程再兴,Tel:62763742 wataloo@hotmail.com • 课程网页: http://www.icst.pku.edu.cn/course/jiqixuexi/jqxx2011.mht

  3. Ch2 Concept Learning & General-to-Specific Ordering • Introduction to concept learning • Concept learning as search • FIND-S algorithm • Version space and CANDIDATE-ELIMINATION algorithm • Inductive bias

  4. Types of learning • Based on types of feedback • Supervised learning: correct answer for each training example (labeled example) • Un-supervised learning: answer not given (unlabeled example) • Mixture of labeled and unlabeled examples: semi-supervised learning • Reinforcement learning: the teacher provides reward or penalty.

  5. Concept Learning & General-to-Specific Ordering • Introduction to concept learning • Concept learning as search • FIND-S algorithm • Version space and CANDIDATE-ELIMINATION algorithm • Inductive bias

  6. Definition & Example • Def. Concept learning is the task of inferring a boolean-valued function from labeled training examples • Example: learning the concept “days on which my friend Aldo enjoys his favorite water sport” from a set of training examples:

  7. Example (contd) Representing hypotheses • One way is to represent a hypo as conjunction of constraints on attributes. Each constraint can be • A specific value (e.g. Water=Warm) • Don’t care (e.g. Water=?) • No value allowed (e.g. Water=Ø) • An example of hypo in EnjoySport: <Sunny ? ? Strong ? Same>

  8. Example (contd) • Most general hypo—every day is a positive example—is represented by <? ? ? ? ? ?> • Most specific hypo—every day is a negative example—is represented by <some attribute=Ø>

  9. Prototypical Concept Learning Task • Given: • Instance space X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. • Sky (Sunny, Cloudy, Rainy); AirTemp (Warm, Cold); Huminity (Normal, High); Wind (Strong, Weak); Water (Warm, Cool); Forecast (Same, Change). • Target function EnjoySport, c: X→{0,1} • Hypo space H: conjunction of literals • Set D of training examples: positive or negative examples <x, c(x)> of target function • Determine: a hypo h in H s.t. h(x)=c(x) for all x in D (a kind of inductive learning)

  10. Inductive Learning: A Brief Overview • Simplest form: learn a function from examples Let f be the target function, then an exampleis a pair (x, f(x)) • Statement of a inductive-learning problem: Given a collection of examples of f , return a function h that approximates f (h is called a hypothesis). • The fundamental problem of induction is the prediction power of learned h

  11. Philosophical Foundation • One motivation behind inductive learning is an attempt to establish the source of knowledge • Aristotle (384-322 B.C.) was the first to formulate a precise set of laws governing the rational part of the mind • The empiricism movement, starting with Francis Bacon’s (1561-1626) Novum Organum (“new instrument” in English) , is characterized by a dictum of John Locke (1632-1704): “Nothing is in the understanding, which is not the first in the senses”.

  12. An Example: Curve Fitting • Examples (x, f(x)) and a consistent linear hypothesis • A consistent degree-7 polynomial for the same data set • A different data set that admits an exact degree-6 polynomial fit or an approximate linear fit • A simple, exact sinusoidal fit to the same data set in c) • A learning problem is realizable if the hypothesis space contains the true function

  13. Ockham’s razor • Q: How do we choose from among multiple consistent hypotheses? • Ockham’s razor: Prefer the simplest hypothesis consistent with the data—”Entities are not to be multiplied beyond necessity” William of Ockham (1280-1349), the most influential philosopher of his century.

  14. Inductive Learning Hypothesis • There is a fundamental assumption underlying the learned hypo, so-called inductive learning hypothesis: Any hypo found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples

  15. Concept Learning & General-to-Specific Ordering • Introduction to concept learning • Concept learning as search • FIND-S algorithm • Version space and CANDIDATE-ELIMINATION algorithm • Inductive bias

  16. An Example: EnjoySport • EnjoySport: • Instance space X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. • Sky (Sunny, Cloudy, Rainy); AirTemp (Warm, Cold); Huminity (Normal, High); Wind (Strong, Weak); Water (Warm, Cool); Forecast (Same, Change). • Target function EnjoySport, c: X→{0,1} • Hypo space H: conjunction of literals • Size of its instance space: 3×2×2×2×2×2=96 • Size of its hypo space: 4×3×3×3×3×3+1=973 • Q: does there exist a way to search the hypo space?

  17. General-to-Specific Ordering of Hypo • An illustration:

  18. “More General Than” Relationship • Def. Let hj and hk be boolean-valued functions defined over X, then hj is more_general_than_or_equal_to hk (written as hj≥ghk iff • Note: “≥g”is independent of target concept • Property: “≥g” is a partial order.

  19. Concept Learning & General-to-Specific Ordering • Introduction to concept learning • Concept learning as search • FIND-S algorithm • Version space and CANDIDATE-ELIMINATION algorithm • Inductive bias

  20. Find-S Algorithm Finding-S: Find a maximally specific hypothesis • Initialize h to the most specific hypothesis in H • For each positive training example x • For each attribute constraint aiin h, if it is satisfied by x, then do nothing; otherwise replace ai by the next more general constraint that is satisfied by x. • Output hypo h

  21. An Illustration of Find-S • Note: If we assume the target concept c is in H, and training examples are noise-free, then the h found via Find-S must also be consistent with c on negative training examples.

  22. Complaints about Find-S • Has the learned h converged to the true target concept? Not sure! • Why prefer the most specific hypothesis? • Are the training examples consistent? • We would prefer an algorithm that might be able to detect when training examples are inconsistent, or even better, be able to correct the error. • What if there are several maximally specific consistent hypothehses?

  23. Concept Learning & General-to-Specific Ordering • Introduction to concept learning • Concept learning as search • FIND-S algorithm • Version space and CANDIDATE-ELIMINATION algorithm • Inductive bias

  24. Version Space • Version space is the set of hypotheses that are consistent with the training data, i.e.

  25. List-Then-Eliminate Algorithm • A “brute force” way of computing version space: list-then-eliminate Algorithm • Initialize VS by H • For each training example <x, c(x)>, eliminate any h in VS that is not consistent with c on x. • Output the resulting VS.

  26. Version Space with Boundary Sets • Need a more compact representation of VS to efficiently compute version space • One approach: Delimit VS by general and specific boundary sets and partial order between the hypotheses. • Example: VS of EnjoySport has six elements which might be ordered in the following way:

  27. VS Representation Theorem • Def. The general boundary G w.r.t. hypo space H and training data D, is the set of maximally general members of H consistent with D. • Def. The specific boundary S w.r.t. hypo space H and training data D, is the set of minimally general (i.e. maximally specific) members of H consistent with D.

  28. VS Representation Theorem (2) • Let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let c be an arbitrary boolean-valued target concept over X, and let D be training set of <x, c(x)>. For all X, H, c, and D s.t. S & G are well-defined,

  29. CANDIDATE-ELIMINATION Algorithm • Initialize G to set of maximally general hypotheses in H • Initialize S to set of maximally specific hypotheses in H • For each training example d, do • If d is a positive example, • Remove from G any hypo inconsistent with d • For each hypo s in S that is inconsistent with d • Remove s from S • Add to S all minimal generalizations h of s s.t. h is consistent with d, and some member of G is more general than h • Remove from S any hypo that is more general than another hypo in S

  30. Contd • If d is a negative example • Remove from S any hypo inconsistent with d • For each hypo g in G that is inconsistent with d • Remove g from G • Add to G all minimal specifications h of g s.t. h is consistent with d, and some member of S is more specific than h • Remove from G any hypo that is more specific than another hypo in G

  31. An Illustrative Example • Find VS of EnjoySport via Candidate-Elimination Algorithm

  32. An Illustrative Example (2)

  33. An Illustrative Example (3)

  34. An Illustrative Example (4)

  35. An Illustrative Example (5) • Final VS learned from those 4 examples:

  36. Remarks • CANDIDATE-ELIMINATION works when the conditions in “version space representation theorem” holds, however, in case that • every instance can be represented as a fixed-length attribute vector with each attribute taking a finite number of possible values, and • the hypo space is restricted to conjunctions of constraints on attributes as defined early, • then operations on S in the algorithm can be simplified to FIND-S (during the process S always be a single-element set)

  37. Remarks (2) • Will the algorithm converge to the correct hypo? Converges if no error in training examples and the true target concept is in H. • What if some training example contains wrong target value? The true target concept won’t be in VS • What if the true target concept is not in H? The VS might be empty

  38. Remarks (3) • What training examples should the learner request next? Consider the case that learner proposes the next instance, and obtain answer from teacher. E.g. • What query should be presented next? • One such instance is <Sunny, Warm, ?, Light, ?, ?>. In general, try generating queries that satisfy exactly half of the hypotheses.

  39. Remarks (4) • How can partially learned concept be used? Consider the VS learned in previous page. Suppose no more training examples, and the learner is required to classify a new instance not yet observed during training. Look at the following 4 examples: • <Sunny Warm Normal Strong Cool Change> • <Rainy Cold Normal Light Warm Same> • <Sunny Warm Normal Light Warm Same> • <Sunny Cold Normal Strong Warm Same> • Assume the target concept is in VS, then labels of above 4 examples are (utilizing the partial order): • ex 1 as “+”; ex 2 as “-”; ex 3 & 4 are ambiguous, and might be assigned a value by voting.

  40. Concept Learning & General-to-Specific Ordering • Introduction to concept learning • Concept learning as search • FIND-S algorithm • Version space and CANDIDATE-ELIMINATION algorithm • Inductive bias

  41. A Biased Hypo Space • Consider EnjoySport: If we restrict H to conjunctions of attributes, then it is unable to represent even a simple disjunctive concept such as “Sky=Sunny or Cloud”. • E.g. given the following three training examples: • <Sunny Warm Normal Strong Cool Change Yes> • < Cloudy Warm Normal Strong Cool Change Yes> • <Rainy Warm Normal Strong Cool Change No> • Candidate-Elimination algorithm (actually any algorithm) will output empty VS.

  42. An Unbiased Learner • One obvious approach for an unbiased hypo space is to alternatively propose a hypo space H’ capable of representing every teachable concept over X, i.e. power set of X • Consider a couple numbers in EnjoySport: \X|=96, number of conjunctive hypotheses equal to 973 (vs. 296) • Apply CANDIDATE-ELIMINATION algorithm to H’ and training set D, then learning algorithm completely loses its generalization power: Every new instance unseen in D will be classified ambiguously!

  43. Futility of Bias-Free Learning • Fundamental Property of Inductive Inference: A learner that makes no a priori assumption (i.e. inductive bias) regarding the identity of target concept has no rational basis for classifying unseen instances. • An interesting idea: characterize various learning approaches by the inductive bias they employ. However,we need to define inductive bias more precisely first.

  44. Inductive Bias • Let L(xi, Dc) denote the classification L assigned to xi after learning from training set Dc. We describe inductive inference step performed by L as follows: • What additional assumptions could be added to Dc∧ xi s.t. L(xi, Dc) would follow deductively? Thus we define inductive bias of L as this set of additional assumptions.

  45. Inductive Bias (2) • Def. Inductive bias of L is any minimal set of assertions B s.t. for any target concept c and training example Dcwe have where “y\-z” indicates z follows deductively from y. • If we define L(xi, Dc) as the unanimous votes by elements of VS found (undermined if not unanimously), then inductive bias of CANDIDATE-ELIMINATION algorithm is “target concept c is in H”

  46. Inductive Bias of Various Learners • Rote-learner: learning by simply storing training examples in memory No inductive bias • CANDIDATE-ELIMINATION: New instances are classified only in case that all members in VS make the same decision Inductive bias: target concept is in VS • FIND-S: has an even stronger inductive bias than CANDIDATE-ELIMINATION

  47. Inductive→Deductive

  48. Summary • Concept learning as search through H • General-to-specific ordering over H • Candidate-Elimination algorithm • Learner can make useful queries • Inductive leaps possible only if learner is biased

  49. More on Concept Learning • Bruner et al. (1957) did a pioneering study of concept learning in human being. Concept learning, also known as category learning or concept attainment, was defined in the book as “the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories”. • Simply put, concepts are the mental categories that help us classify objects, events, or ideas, and each object, event, or idea has a set of common relevant features. (Wikipedia)

  50. On Bruner et al.’s book • Editorial Reviews (1986 ed.): “A Study of Thinking” is a pioneering account of how human beings achieve a measure of rationality in spite of the constraints imposed by bias, limited attention and memory, and the risks of error imposed by pressures of time and ignorance. First published in 1956 and hailed at its appearance as a groundbreaking study, it is still read three decades later as a major contribution to our understanding of the mind. In their insightful new introduction, the authors relate the book to the cognitive revolution and its handmaiden, artificial intelligence.

More Related