Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

Lecture 3 Data Mining Basics Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 1-2, Witten and Frank Sections 2.7-2.8, Mitchell

Lecture Outline • Read Chapters 1-2, Witten and Frank; 2.7-2.8, Mitchell • Homework 1: Due Friday, February 2, 2001 (before 12 AM CST) • Paper Commentary 1: Due This Friday (in class) • U. Fayyad, “From Data Mining to Knowledge Discovery” • See guidelines in course notes • Supervised Learning (continued) • Version spaces • Candidate elimination algorithm • Derivation • Examples • The Need for Inductive Bias • Representations (hypothesis languages): a worst-case scenario • Change of representation • Computational Learning Theory

Representing Version Spaces • Hypothesis Space • A finite meet semilattice (partial ordering Less-Specific-Than;   all ?) • Every pair of hypotheses has a greatest lower bound (GLB) • VSH,D the consistent poset (partially-ordered subset of H) • Definition: General Boundary • General boundaryG of version space VSH,D : set of most general members • Most general minimal elements of VSH,D “set of necessary conditions” • Definition: Specific Boundary • Specific boundary S of version space VSH,D : set of most specific members • Most specific maximal elements of VSH,D “set of sufficient conditions” • Version Space • Every member of the version space lies between S and G • VSH,D  { hH |  s S .  g G . g P h P s} where P  Less-Specific-Than

Candidate Elimination Algorithm [1] 1. Initialization G (singleton) set containing most general hypothesis in H, denoted {<?, … , ?>} S  set of most specific hypotheses in H, denoted {<Ø, … , Ø>} 2. For each training example d If d is a positive example (Update-S) Remove from G any hypotheses inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that 1. h is consistent with d 2. Some member of G is more general than h (These are the greatest lower bounds, or meets, s d, in VSH,D) Remove from S any hypothesis that is more general than another hypothesis in S (remove any dominated elements)

Candidate Elimination Algorithm [2] (continued) If d is a negative example (Update-G) Remove from S any hypotheses inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that 1. h is consistent with d 2. Some member of S is more specific than h (These are the least upper bounds, or joins, g d, in VSH,D) Remove from G any hypothesis that is less general than another hypothesis in G (remove any dominating elements)

S0 <Ø, Ø, Ø, Ø, Ø, Ø> S1 <Sunny, Warm, Normal, Strong, Warm, Same> S2 <Sunny, Warm, ?, Strong, Warm, Same> = S3 S4 <Sunny, Warm, ?, Strong, ?, ?> <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?> G4 <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same> G3 G0 = G1 = G2 <?, ?, ?, ?, ?, ?> Example Trace d1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes> d2: <Sunny, Warm, High, Strong, Warm, Same, Yes> d3: <Rainy, Cold, High, Strong, Warm, Change, No> d4: <Sunny, Warm, High, Strong, Cool, Change, Yes>

S: <Sunny, Warm, ?, Strong, ?, ?> <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?> G: <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> What Next Training Example? • What Query Should The Learner Make Next? • How Should These Be Classified? • <Sunny, Warm, Normal, Strong, Cool, Change> • <Rainy, Cold, Normal, Light, Warm, Same> • <Sunny, Warm, Normal, Light, Warm, Same>

What Justifies This Inductive Leap? • Example: Inductive Generalization • Positive example: <Sunny, Warm, Normal, Strong, Cool, Change, Yes> • Positive example: <Sunny, Warm, Normal, Light, Warm, Same, Yes> • Induced S: <Sunny, Warm, Normal, ?, ?, ?> • Why Believe We Can Classify The Unseen? • e.g., <Sunny, Warm, Normal, Strong, Warm, Same> • When is there enough information (in a new case) to make a prediction?

An Unbiased Learner • Example of A Biased H • Conjunctive concepts with don’t cares • What concepts can Hnot express? (Hint: what are its syntactic limitations?) • Idea • Choose H’ that expresses every teachable concept • i.e., H’ is the power set of X • Recall: | A  B | = | B | | A | (A = X; B= {labels}; H’ = A  B) • {{Rainy, Sunny}  {Warm, Cold}  {Normal, High}  {None, Mild, Strong}  {Cool, Warm}  {Same, Change}} {0, 1} • An Exhaustive Hypothesis Language • Consider: H’ = disjunctions (), conjunctions (), negations (¬) over previous H • | H’ | = 2(2 • 2 • 2 • 3 • 2 • 2) = 296; | H | = 1 + (3 • 3 • 3 • 4 • 3 • 3) = 973 • What Are S, G For The Hypothesis Language H’? • S  disjunction of all positive examples • G  conjunction of all negated negative examples

Inductive Bias • Components of An Inductive Bias Definition • Concept learning algorithm L • Instances X, target concept c • Training examples Dc = {<x, c(x)>} • L(xi, Dc) = classification assigned to instance xiby L after training on Dc • Definition • The inductive bias of L is any minimal set of assertions B such that, for any target concept c and corresponding training examples Dc,  xi X . [(B  Dc  xi) | L(xi, Dc)] where A | B means A logically entails B • Informal idea: preference for (i.e., restriction to) certain hypotheses by structural (syntactic) means • Rationale • Prior assumptions regarding target concept • Basis for inductive generalization

Inductive System Candidate Elimination Algorithm Using Hypothesis Space H Training Examples Classification of New Instance (or “Don’t Know”) Classification of New Instance (or “Don’t Know”) New Instance Equivalent Deductive System Training Examples Theorem Prover New Instance Assertion { c H } Inductive bias made explicit Inductive Systemsand Equivalent Deductive Systems

Three Learners with Different Biases • Rote Learner • Weakest bias: anything seen before, i.e., no bias • Store examples • Classify xif and only if it matches previously observed example • Version Space Candidate Elimination Algorithm • Stronger bias: concepts belonging to conjunctive H • Store extremal generalizations and specializations • Classify xif and only if it “falls within” S and G boundaries (all members agree) • Find-S • Even stronger bias: most specific hypothesis • Prior assumption: any instance not observed to be positive is negative • Classify x based on S set

x1 Unknown Function x2 y = f (x1,x2,x3, x4 ) x3 x4 Hypothesis Space:A Syntactic Restriction • Recall: 4-Variable Concept Learning Problem • Bias: Simple Conjunctive Rules • Only 16 simple conjunctive rules of the form y = xi  xj  xk • y = Ø, x1, …, x4,x1  x2, …, x3  x4, x1  x2  x3, …, x2  x3  x4, x1  x2  x3  x4 • Example above: no simple rule explains the data (counterexamples?) • Similarly for simple clauses (conjunction and disjunction allowed)

Hypothesis Space:m-of-n Rules • m-of-n Rules • 32 possible rules of the form: “y = 1 iff at least m of the following n variables are 1” • Found A Consistent Hypothesis!

Views of Learning • Removal of (Remaining) Uncertainty • Suppose unknown function was known to be m-of-n Boolean function • Could use training data to infer the function • Learning and Hypothesis Languages • Possible approach to guess a good, small hypothesis language: • Start with a very small language • Enlarge until it contains a hypothesis that fits the data • Inductive bias • Preference for certain languages • Analogous to data compression (removal of redundancy) • Later: coding the “model” versus coding the “uncertainty” (error) • We Could Be Wrong! • Prior knowledge could be wrong (e.g., y = x4 one-of (x1, x3) also consistent) • If guessed language was wrong, errors will occur on new cases

Two Strategies for Machine Learning • Develop Ways to Express Prior Knowledge • Role of prior knowledge: guides search for hypotheses / hypothesis languages • Expression languages for prior knowledge • Rule grammars; stochastic models; etc. • Restrictions on computational models; other (formal) specification methods • Develop Flexible Hypothesis Spaces • Structured collections of hypotheses • Agglomeration: nested collections (hierarchies) • Partitioning: decision trees, lists, rules • Neural networks; cases, etc. • Hypothesis spaces of adaptive size • Either Case: Develop Algorithms for Finding A Hypothesis That Fits Well • Ideally, will generalize well • Later: Bias Optimization (Meta-Learning, Wrappers)

Terminology • The Version Space Algorithm • Version space: constructive definition • S and G boundaries characterize learner’s uncertainty • Version space can be used to make predictions over unseen cases • Algorithms: Find-S, List-Then-Eliminate, candidate elimination • Consistent hypothesis - one that correctly predicts observed examples • Version space - space of all currently consistent (or satisfiable) hypotheses • Inductive Bias • Strength of inductive bias: how few hypotheses? • Specific biases: based on specific languages • Hypothesis Language • “Searchable subset” of the space of possible descriptors • m-of-n, conjunctive, disjunctive, clauses • Ability to represent a concept

Summary Points • Introduction to Supervised Concept Learning • Inductive Leaps Possible Only if Learner Is Biased • Futility of learning without bias • Strength of inductive bias: proportional to restrictions on hypotheses • Modeling Inductive Learners with Equivalent Deductive Systems • Representing inductive learning as theorem proving • Equivalent learning and inference problems • Syntactic Restrictions • Example: m-of-n concept • Other examples? • Views of Learning and Strategies • Removing uncertainty (“data compression”) • Role of knowledge • Next Lecture: More on Knowledge Discovery in Databases (KDD)

Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

Monday, January 22, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

Presentation Transcript

Wednesday, January 19, 2000 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/

Friday, January 12, 2001 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bh

Tuesday, December 04, 2001 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~

Thursday, December 06, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

Wednesday, 28 January 2004 William H. Hsu Department of Computing and Information Sciences, KSU

Thursday,04 October 2001 William H. Hsu Department of Computing and Information Sciences, KSU

Wednesday, February 7, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

Friday, January 21, 2000 William H. Hsu Department of Computing and Information Sciences, KSU

Monday, 26 January 2004 William H. Hsu Department of Computing and Information Sciences, KSU

Thursday, November 29, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

Tuesday, November 20, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

Thursday, November 08, 2001 William H. Hsu Department of Computing and Information Sciences, KSU

William H. Hsu Department of Computing and Information Sciences, KSU

Monday, 10 March 2008 William H. Hsu Department of Computing and Information Sciences, KSU

Monday, March 27, 2000 William H. Hsu Department of Computing and Information Sciences, KSU

Friday, 25 January 2008 William H. Hsu Department of Computing and Information Sciences, KSU

Wednesday, 30 January 2008 William H. Hsu Department of Computing and Information Sciences, KSU

Monday, September 18, 2000 William H. Hsu Department of Computing and Information Sciences, KSU

Friday, January 14, 2000 William H. Hsu Department of Computing and Information Sciences, KSU

Monday, March 27, 2000 William H. Hsu Department of Computing and Information Sciences, KSU

Thursday 31 October 2002 William H. Hsu Department of Computing and Information Sciences, KSU

Monday, February 12, 2001 William H. Hsu Department of Computing and Information Sciences, KSU