1 / 18

Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

Lecture 3. Data Mining Basics. Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 1-2, Witten and Frank Sections 2.7-2.8, Mitchell. Lecture Outline.

ady
Download Presentation

Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3 Data Mining Basics Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 1-2, Witten and Frank Sections 2.7-2.8, Mitchell

  2. Lecture Outline • Read Chapters 1-2, Witten and Frank; 2.7-2.8, Mitchell • Homework 1: Due Friday, February 2, 2001 (before 12 AM CST) • Paper Commentary 1: Due This Friday (in class) • U. Fayyad, “From Data Mining to Knowledge Discovery” • See guidelines in course notes • Supervised Learning (continued) • Version spaces • Candidate elimination algorithm • Derivation • Examples • The Need for Inductive Bias • Representations (hypothesis languages): a worst-case scenario • Change of representation • Computational Learning Theory

  3. Representing Version Spaces • Hypothesis Space • A finite meet semilattice (partial ordering Less-Specific-Than;   all ?) • Every pair of hypotheses has a greatest lower bound (GLB) • VSH,D the consistent poset (partially-ordered subset of H) • Definition: General Boundary • General boundaryG of version space VSH,D : set of most general members • Most general minimal elements of VSH,D “set of necessary conditions” • Definition: Specific Boundary • Specific boundary S of version space VSH,D : set of most specific members • Most specific maximal elements of VSH,D “set of sufficient conditions” • Version Space • Every member of the version space lies between S and G • VSH,D  { hH |  s S .  g G . g P h P s} where P  Less-Specific-Than

  4. Candidate Elimination Algorithm [1] 1. Initialization G (singleton) set containing most general hypothesis in H, denoted {<?, … , ?>} S  set of most specific hypotheses in H, denoted {<Ø, … , Ø>} 2. For each training example d If d is a positive example (Update-S) Remove from G any hypotheses inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that 1. h is consistent with d 2. Some member of G is more general than h (These are the greatest lower bounds, or meets, s d, in VSH,D) Remove from S any hypothesis that is more general than another hypothesis in S (remove any dominated elements)

  5. Candidate Elimination Algorithm [2] (continued) If d is a negative example (Update-G) Remove from S any hypotheses inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that 1. h is consistent with d 2. Some member of S is more specific than h (These are the least upper bounds, or joins, g d, in VSH,D) Remove from G any hypothesis that is less general than another hypothesis in G (remove any dominating elements)

  6. S0 <Ø, Ø, Ø, Ø, Ø, Ø> S1 <Sunny, Warm, Normal, Strong, Warm, Same> S2 <Sunny, Warm, ?, Strong, Warm, Same> = S3 S4 <Sunny, Warm, ?, Strong, ?, ?> <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?> G4 <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same> G3 G0 = G1 = G2 <?, ?, ?, ?, ?, ?> Example Trace d1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes> d2: <Sunny, Warm, High, Strong, Warm, Same, Yes> d3: <Rainy, Cold, High, Strong, Warm, Change, No> d4: <Sunny, Warm, High, Strong, Cool, Change, Yes>

  7. S: <Sunny, Warm, ?, Strong, ?, ?> <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?> G: <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> What Next Training Example? • What Query Should The Learner Make Next? • How Should These Be Classified? • <Sunny, Warm, Normal, Strong, Cool, Change> • <Rainy, Cold, Normal, Light, Warm, Same> • <Sunny, Warm, Normal, Light, Warm, Same>

  8. What Justifies This Inductive Leap? • Example: Inductive Generalization • Positive example: <Sunny, Warm, Normal, Strong, Cool, Change, Yes> • Positive example: <Sunny, Warm, Normal, Light, Warm, Same, Yes> • Induced S: <Sunny, Warm, Normal, ?, ?, ?> • Why Believe We Can Classify The Unseen? • e.g., <Sunny, Warm, Normal, Strong, Warm, Same> • When is there enough information (in a new case) to make a prediction?

  9. An Unbiased Learner • Example of A Biased H • Conjunctive concepts with don’t cares • What concepts can Hnot express? (Hint: what are its syntactic limitations?) • Idea • Choose H’ that expresses every teachable concept • i.e., H’ is the power set of X • Recall: | A  B | = | B | | A | (A = X; B= {labels}; H’ = A  B) • {{Rainy, Sunny}  {Warm, Cold}  {Normal, High}  {None, Mild, Strong}  {Cool, Warm}  {Same, Change}} {0, 1} • An Exhaustive Hypothesis Language • Consider: H’ = disjunctions (), conjunctions (), negations (¬) over previous H • | H’ | = 2(2 • 2 • 2 • 3 • 2 • 2) = 296; | H | = 1 + (3 • 3 • 3 • 4 • 3 • 3) = 973 • What Are S, G For The Hypothesis Language H’? • S  disjunction of all positive examples • G  conjunction of all negated negative examples

  10. Inductive Bias • Components of An Inductive Bias Definition • Concept learning algorithm L • Instances X, target concept c • Training examples Dc = {<x, c(x)>} • L(xi, Dc) = classification assigned to instance xiby L after training on Dc • Definition • The inductive bias of L is any minimal set of assertions B such that, for any target concept c and corresponding training examples Dc,  xi X . [(B  Dc  xi) | L(xi, Dc)] where A | B means A logically entails B • Informal idea: preference for (i.e., restriction to) certain hypotheses by structural (syntactic) means • Rationale • Prior assumptions regarding target concept • Basis for inductive generalization

  11. Inductive System Candidate Elimination Algorithm Using Hypothesis Space H Training Examples Classification of New Instance (or “Don’t Know”) Classification of New Instance (or “Don’t Know”) New Instance Equivalent Deductive System Training Examples Theorem Prover New Instance Assertion { c H } Inductive bias made explicit Inductive Systemsand Equivalent Deductive Systems

  12. Three Learners with Different Biases • Rote Learner • Weakest bias: anything seen before, i.e., no bias • Store examples • Classify xif and only if it matches previously observed example • Version Space Candidate Elimination Algorithm • Stronger bias: concepts belonging to conjunctive H • Store extremal generalizations and specializations • Classify xif and only if it “falls within” S and G boundaries (all members agree) • Find-S • Even stronger bias: most specific hypothesis • Prior assumption: any instance not observed to be positive is negative • Classify x based on S set

  13. x1 Unknown Function x2 y = f (x1,x2,x3, x4 ) x3 x4 Hypothesis Space:A Syntactic Restriction • Recall: 4-Variable Concept Learning Problem • Bias: Simple Conjunctive Rules • Only 16 simple conjunctive rules of the form y = xi  xj  xk • y = Ø, x1, …, x4,x1  x2, …, x3  x4, x1  x2  x3, …, x2  x3  x4, x1  x2  x3  x4 • Example above: no simple rule explains the data (counterexamples?) • Similarly for simple clauses (conjunction and disjunction allowed)

  14. Hypothesis Space:m-of-n Rules • m-of-n Rules • 32 possible rules of the form: “y = 1 iff at least m of the following n variables are 1” • Found A Consistent Hypothesis!

  15. Views of Learning • Removal of (Remaining) Uncertainty • Suppose unknown function was known to be m-of-n Boolean function • Could use training data to infer the function • Learning and Hypothesis Languages • Possible approach to guess a good, small hypothesis language: • Start with a very small language • Enlarge until it contains a hypothesis that fits the data • Inductive bias • Preference for certain languages • Analogous to data compression (removal of redundancy) • Later: coding the “model” versus coding the “uncertainty” (error) • We Could Be Wrong! • Prior knowledge could be wrong (e.g., y = x4 one-of (x1, x3) also consistent) • If guessed language was wrong, errors will occur on new cases

  16. Two Strategies for Machine Learning • Develop Ways to Express Prior Knowledge • Role of prior knowledge: guides search for hypotheses / hypothesis languages • Expression languages for prior knowledge • Rule grammars; stochastic models; etc. • Restrictions on computational models; other (formal) specification methods • Develop Flexible Hypothesis Spaces • Structured collections of hypotheses • Agglomeration: nested collections (hierarchies) • Partitioning: decision trees, lists, rules • Neural networks; cases, etc. • Hypothesis spaces of adaptive size • Either Case: Develop Algorithms for Finding A Hypothesis That Fits Well • Ideally, will generalize well • Later: Bias Optimization (Meta-Learning, Wrappers)

  17. Terminology • The Version Space Algorithm • Version space: constructive definition • S and G boundaries characterize learner’s uncertainty • Version space can be used to make predictions over unseen cases • Algorithms: Find-S, List-Then-Eliminate, candidate elimination • Consistent hypothesis - one that correctly predicts observed examples • Version space - space of all currently consistent (or satisfiable) hypotheses • Inductive Bias • Strength of inductive bias: how few hypotheses? • Specific biases: based on specific languages • Hypothesis Language • “Searchable subset” of the space of possible descriptors • m-of-n, conjunctive, disjunctive, clauses • Ability to represent a concept

  18. Summary Points • Introduction to Supervised Concept Learning • Inductive Leaps Possible Only if Learner Is Biased • Futility of learning without bias • Strength of inductive bias: proportional to restrictions on hypotheses • Modeling Inductive Learners with Equivalent Deductive Systems • Representing inductive learning as theorem proving • Equivalent learning and inference problems • Syntactic Restrictions • Example: m-of-n concept • Other examples? • Views of Learning and Strategies • Removing uncertainty (“data compression”) • Role of knowledge • Next Lecture: More on Knowledge Discovery in Databases (KDD)

More Related