1 / 26

Announcements

Announcements. List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30. Classification problems and Machine Learning. Lecture 10. EnjoySport concept learning task. Given Instances X : Possible days, each described by the attributes

axelle
Download Presentation

Announcements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Announcements • List of 5 source for research paper • Homework 5 due Tuesday, October 30 • Book Review due Tuesday, October 30 CS 484 – Artificial Intelligence

  2. Classification problems and Machine Learning Lecture 10

  3. EnjoySport concept learning task • Given • Instances X: Possible days, each described by the attributes • Sky (with possible values Sunny, Cloudy, and Rainy) • AirTemp (with values Warm and Cold) • Humidity (with values Normal and High) • Wind (with values Strong and Weak) • Water (with values Warm and Cool), and • Forecast (with values Same and Change) • Hypothesis H: Each hypothesis is described by a conjunction of constraints on the attributes. The constraints may be "?", "Ø", or a specific value • Target concept c: EnjoySport : X→ {0,1} • Training Examples D: Positive or negative examples of the target function • Determine • A hypothesis h in H such that h(x) = c(x) for all x in X CS 484 – Artificial Intelligence

  4. Find-S: Finding a maximally Specific Hypothesis (review) • Initialize h to the most specific hypothesis in H • For each positive training instance x • For each attribute constraint ai in h • If the constraint ai is satisfied by x • Then do nothing • Else replace ai in h by the next more general constraint that is satisfied by x • Output hypothesis h • Begin: h ← <Ø, Ø, Ø, Ø, Ø, Ø> CS 484 – Artificial Intelligence

  5. Candidate Elimination • Candidate elimination aims to derive one hypothesis which matches all training data (including negative examples). S: {<Sunny, Warm, ?, Strong, ?, ?>} <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?> G: {<Sunny, ?, ?, ?, ?, ?>}, {<?, Warm, ?, ?, ?, ?>} CS 484 – Artificial Intelligence

  6. Candidate-Elimination Learning Algorithm • Initialize G to the set of maximally general hypotheses in H • Initialize S to the set of maximally specific hypotheses in H • For each training example d, do • If d is a positive example • Remove from G any hypothesis inconsistent with d • For each hypothesis s in S that is not consistent with d • Remove s from S • Add to S all minimal generalizations h of s such that • h is consistent with d, and some member of G is more general than h • Remove from S any hypothesis that is more general than another hypothesis in S • If d is a negative example • Remove from S any hypothesis inconsistent with d • For each hypothesis g in G that is not consistent with d • Remove g from G • Add to G all minimal specializations h of g such that • h is consistent with d, and some member of S is more specific than h • Remove from G any hypothesis that is less general than another hypothesis in G CS 484 – Artificial Intelligence

  7. G0← {<?,?,?,?,?,?>} G1← G2← S0← {< Ø, Ø, Ø, Ø, Ø, Ø>} S1← S2← Example E1 = <Sunny, Warm, Normal, Strong, Warm, Same> positive E2 = <Sunny, Warm, High, Strong, Warm, Same> positive CS 484 – Artificial Intelligence

  8. G2← G3← S2← S3← Example (cont. 2) E3 = <Rainy, Cold, High, Strong, Warm, Change> negative CS 484 – Artificial Intelligence

  9. G3← G4← S3← S4← Example (cont. 3) E4 = <Sunny, Warm, High, Strong, Cool, Change> positive CS 484 – Artificial Intelligence

  10. Decision Tree Learning • Has two major benefits over Find-S and Candidate Elimination • Can cope with noisy data • Capable of learning disjunctive expressions • Limitation • May be many valid decision trees given the training data • Prefers small trees over large trees • Apply to board range of learning tasks • Classify medical patients by their disease • Classify equipment malfunctions by their cause • Classify loan applicants by their likelihood of defaulting on payments CS 484 – Artificial Intelligence

  11. Decision Tree Example Outlook Rain Sunny Overcast Yes Humidity Wind Normal High Strong Weak No Yes No Yes Days on which to play tennis CS 484 – Artificial Intelligence

  12. Decision Tree Induction (1) • Decision tree induction involves creating a decision tree from a set of training data that can be used to correctly classify the training data. • ID3 is an example of a decision tree learning algorithm. • ID3 builds the decision tree from the top down, selecting the features from the training data that provide the most information at each stage. CS 484 – Artificial Intelligence

  13. Decision Tree Induction (2) • ID3 selects attributes based on information gain. • Information gain is the reduction in entropy caused by a decision. • Entropy is defined as: H(S) = - p1 log2 p1 - p0 log2 p0 • p1 is the proportion of the training data which are positive examples • p0 is the proportion which are negative examples • Intuition about H(S) • Zero (min value) when all the examples are the same (positive or negative) • One (max value) when half are positive and half are negative. CS 484 – Artificial Intelligence

  14. Example – Training Data CS 484 – Artificial Intelligence

  15. Calculate Information Gain • Initial Entropy • All examples in one class • 9 positive examples, 5 negative examples • H(init) = -.643 log2 .643 - .357 log2 .357 = 0.940 • Calculate Entropy for each attribute and then combine as a weighted sum • Entropy of "Outlook" • Sunny • 5 examples, 2 positives, 3 negatives • H(Sunny) = -(2/5) log2 (2/5) – (3/5) log2 (3/5) = 0.971 • Overcast • 4 examples, 4 positives, 0 negatives • H(Overcast) = -1 log2 (1) – (0) log2 (0) = 0 (0 log2 0 is defined as 0) • Rain • 5 examples, 3 positives, 2 negatives • H(Rain) = -(3/5) log2 (3/5) – (2/5) log2 (2/5) = 0.971 • H(Outlook) = .357(0.971) + .286(0) + .357(0.971) = 0.694 • Information Gain = H(0) – H(1) • Gain = 0.940 – 0.694 = .246 CS 484 – Artificial Intelligence

  16. Maximize Information Gain • Gain of each attribute • Gain(Outlook) = 0.246 • Gain(Humidity) = 0.151 • Gain(Wind) = 0.048 • Gain(Temperature) = 0.029 {D1, D2, …, D14} [9+,5-] Outlook Rain Sunny Overcast {D4, D5, D6,D10, D14} [3+,2-] {D1, D2, D8,D9, D11} [2+,3-] Yes ? ? {D3, D7, D12, D13} [4+,0-] CS 484 – Artificial Intelligence

  17. Unbiased Learner • Provide a hypothesis space capable of representing everyteachable concept • Every possible subset of the instances X (power set of X) • How large is this space? • For EnjoySport, there are 96 instances in X • The power set is 2|X| • EnjoySport has 1028 distinct target concepts • Allows disjunctions, conjunctions, and negations • Can no longer generalize beyond observed examples CS 484 – Artificial Intelligence

  18. Inductive Bias • All learning methods have an inductive bias. • The inductive bias of a learning method is the set of restrictions on the learning method. • Without inductive bias, a learning method could not learn to generalize. • A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances CS 484 – Artificial Intelligence

  19. Bias in Learning Algorithms • Rote-Learner: If the instance is found in memory, the stored classification is returned. Otherwise, the system refuses to classify the new instance • Find-S: Finds the most specific hypothesis consistent with the training examples. It then uses this hypothesis to classify all subsequent instances CS 484 – Artificial Intelligence

  20. Candidate-Elimination Bias • Candidate-Elimination will converge to true target concept provided accurate training examples and its initial hypothesis space contains the true target concept • Only consider conjunctions of attribute values • Cannot represent "Sky = Sunny or Sky = Cloudy" • What if the target concept is not contained in the hypothesis space? CS 484 – Artificial Intelligence

  21. Bias of ID3 • Choose the first acceptable tree it encounters in its simple-to-complex, hill-climbing search • Favors shorter trees over longer ones • Selects trees that place the attributes with highest information gain closest to the root • Interaction between attribute selection heuristic and training examples makes it difficult to precisely characterize its bias CS 484 – Artificial Intelligence

  22. ID3 vs. Candidate Elimination • Difference between the types of inductive bias • Hypothesis space • ID3 searches a complete hypothesis space • Inductive bias is a consequence of the ordering of hypotheses by its search strategy • Candidate-Elimination searches an incomplete hypothesis space • Searches the space completely • Inductive bias is a consequence of the expressive power of its hypothesis representation CS 484 – Artificial Intelligence

  23. Why Prefer Short Hypotheses? • Occam's razor • Prefer the simplest hypothesis that fits the data • Appling Occam's razor • Fewer short hypotheses than long ones, so it is less likely that one will find a short hypothesis that coincidentally fits the training data • A 5-node tree is less likely to be a statistical coincidence and prefer this hypothesis over the 500-node hypothesis • Problems with this argument • By the same argument, you could put many more qualifications on the decision tree. Would that be better? • Size is determined by the particular representation used internally by the learner • Don't reject Occam's razor all together • Evolution will create internal representations that make the learning algorithm's inductive bias a self-fulfilling prophecy, simply because it can alter the representation easier than it can alter the learning algorithm CS 484 – Artificial Intelligence

  24. The Problem of Overfitting Black dots represent positive examples, white dots negative. The two lines represent two different hypotheses. In the first diagram, there are just a few items of training data, correctly classified by the hypothesis represented by the darker line. In the second and third diagrams we see the complete set of data, and that the simpler hypothesis which matched the training data less well matches the rest of the data better than the more complex hypothesis, which overfits. CS 484 – Artificial Intelligence

  25. The Nearest Neighbor Algorithm (1) • This is an example of instance based learning. • Instance based learning involves storing training data and using it to attempt to classify new data as it arrives. • The nearest neighbor algorithm works with data that consists of vectors of numeric attributes. • Each vector represents a point in n-dimensional space. CS 484 – Artificial Intelligence

  26. The Nearest Neighbor Algorithm (2) • When an unseen data item is to be classified, the Euclidean distance is calculated between this item and all training data. • the distance between <x1, y1> and <x2, y2> is: • The classification for the unseen data is usually selected as the one that is most common amongst the few nearest neighbors. • Shepard’s method involves allowing all training data to contribute to the classification with their contribution being proportional to their distance from the data item to be classified. CS 484 – Artificial Intelligence

More Related