1 / 28

Overview of Today’s Lecture

Overview of Today’s Lecture. Last Time: course introduction Reading assignment posted to class webpage Don’t get discouraged Today: introduction to “Supervised Machine Learning” Our first ML algorithm: K-nearest neighbor HW 0 out online Create a dataset of “fixed-length feature vectors”

zohar
Download Presentation

Overview of Today’s Lecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Today’s Lecture • Last Time: course introduction • Reading assignment posted to class webpage • Don’t get discouraged • Today: introduction to “Supervised Machine Learning” • Our first ML algorithm: K-nearest neighbor • HW 0 out online • Create a dataset of • “fixed-length feature vectors” • Due next Tuesday Sept 19 (4 PM) • Instructions for handing in HW0 coming soon

  2. Supervised Learning: Overview Digital Representation (feature space) Real World classification rules select features construct classifier If feature 2 = X then APPLY BREAK = TRUE machine humans HW 1-2 HW 0

  3. Supervised Learning: Task Definition • Given • A collection of positive examples of some concept/class/category (i.e., members of the class) and, possibly, a collection of the negative examples (i.e., non-members) • Produce • A description that covers (includes) all (most) of the positive examples and none (few) of the negative examples (which, hopefully, properly categorizes most future examples!) The Key Point! Note: one can easily extend this definition to handle more than two classes

  4. Example Positive Examples Negative Examples How does this symbol classify? • Concept • Solid Red Circle in a Regular Polygon • What about? • Figure with red solid circles not in larger red circle • Figures on left side of page etc

  5. HW0 – Your “Personal Concept” • Step 1: Choose a Boolean (true/false) concept • Subjective judgment (can’t articulate) • Books I like/dislike • Movies I like/dislike • www pages I like/dislike • “time will tell” concepts • Stocks to buy • Medical treatment (at time t, predict outcome at time (t +∆t)) • Sensory interpretation • Face recognition (See text) • Handwritten digit recognition • Sound recognition • Hard to program functions

  6. HW0 – Your “Personal Concept” • Step 2: Choose a feature space • We will use fixed-length feature vectors • Choose N features • Each feature has Vipossible values • Each example is represented by a vector of N feature values (i.e., is a point in the feature space) e.g.: <red, 50, round> colorweight shape • Feature Types • Boolean • Nominal • Ordered • Hierarchical • Step 3: Collect examples (“I/O” pairs) Defines a space We will not use hierarchical features

  7. closed polygon continuous square triangle circle ellipse Standard Feature Typesfor representing training examples – source of “domain knowledge” • Nominal (Boolean is a special case) • No relationship among possible values e.g., color є {red, blue, green} (vs. color = 1000 Hertz) • Linear (or Ordered) • Possible values of the feature are totally ordered e.g., size є {small, medium, large} ←discrete weight є [0…500] ←continuous • Hierarchical • Possible values are partiallyordered in an ISA hierarchy e.g. for shape->

  8. Product Pet Foods Tea 99 Product Classes 2302 Product Subclasses Dried Cat Food Canned Cat Food Friskies Liver, 250g ~30k Products Example Hierarchy (KDD* Journal, Vol 5, No. 1-2, 2001, page 17) • Structure of one feature! • “the need to be able to incorporate hierarchical (knowledge about data types) is shown in every paper.” • - From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001 * Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers

  9. Digitized camera image Learned Function Steering Angle age = 13 sex = M wgt = 18 Learned Function ill vs healthy Some Famous Examples • Car Steering (Pomerleau) • Medical Diagnosis (Quinlan) • DNA Categorization • TV-pilot rating • Chemical-plant control • Back gammon playing • WWW page scoring • Credit application scoring Medical record

  10. HW0: Creating your dataset • Choose a dataset • based on interest/familiarity • meets basic requirements • >1000 examples • category (function) learned should be binary valued • ~500 “true” and “false” examples → Internet Movie Database (IMDb)

  11. Example Database: IMDb • Name • Country • Movies • Name • Year of birth • Movies • Name • Year of birth • Gender • Oscars • Movies Studio Actor Director/ Producer Made Acted in Directed Produced • Title • Genre • Year • Opening Weekend • BO receipts • List of actors/actresses • Release season Movie

  12. HW0: Creating your dataset Choose Boolean target function (category) • Some examples: • Opening weekend box office receipts > $2 million • Movie is drama? (action, sci-fi,…) • Movies I like/dislike (e.g. Tivo)

  13. HW0: Creating your dataset • Movie • Average age of actors • Number of producers • Percent female actors • Studio • Number of movies made • Average movie gross • Percent movies released in US Create your feature space • Director/Producer • Years of experience • Most prevalent genre • Number of award winning movies • Average movie gross • Actor • Gender • Has previous Oscar award or nominations • Most prevalent genre

  14. HW0: Creating your dataset David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they selected and a novel way of sampling from the data: • Opening weekend box office receipts > $2 million • 25 attributes • Accuracy = 83.3% • Default accuracy = 56% • Movie is drama? • 12 attributes • Accuracy = 71.9% • Default accuracy = 51% • http://kdl.cs.umass.edu/proximity/about.html

  15. Back to Supervised Learning One way learning systems differ is in how they represent concepts: Neural Net Backpropagation C4.5, CART Decision Tree Training Examples AQ, FOIL Φ <- X^Y Φ <- Z Rules . . . SVMs If 5x1 + 9x2 – 3x3 > 12 Then +

  16. Feature Space If examples are described in terms of values of features, they can be plotted as points in an N-dimensional space. Size Big ? Color Gray 2500 Weight A “concept” is then a (possibly disjoint) volume in this space.

  17. Supervised Learning = Learning from Labeled Examples • Most common & successful form of ML Venn Diagram - - - - + + + - + - - - • Examples – points in multi-dimensional “feature space” • Concepts – “function” that labels points in feature space • (as +, -, and possibly ?)

  18. Brief Review Instances • Conjunctive Concept • Color(?obj1, red) ^ • Size(?obj1, large) • Disjunctive Concept • Color(?obj2, blue) v • Size(?obj2, small) “and” “or” A A A

  19. Empirical Learning and Venn Diagrams Venn Diagram Concept = A or B (Disjunctive concept) Examples = labeled points in feature space Concept = a label for a set of points - - - - - - - - + + - - - - + - - + - + - + + + + + + + + + + - - - - - A - - - + + + - + - B - - - - - - - - Feature Space

  20. Aspects of an ML System • “Language” for representing examples • “Language” for representing “Concepts” • Technique for producing concept “consistent” with the training examples • Technique for classifying new instance Each of these limits the expressiveness/efficiency of the supervised learning algorithm. HW 0 Other HW’s

  21. Nearest-Neighbor Algorithms (aka. Exemplar models, instance-based learning (IBL), case-based learning) • Learning ≈ memorize training examples • Problem solving = find most similar example in memory; output its category Venn - - + + + + “Voronoi Diagrams” (pg 233) + - … - - - + - + + + + ? -

  22. Sample Experimental Results Simple algorithm works quite well!

  23. Simple Example – 1-NN (1-NN ≡one nearest neighbor) Training Set • a=0, b=0, c=1+ • a=0, b=1, c=0- • a=1, b=1, c=1- Test Example • a=0, b=1, c=0 ? • “Hamming Distance” • Ex 1 = 2 • Ex 2 = 1 • Ex 3 = 2 So output -

  24. K-NN Algorithm Collect K nearest neighbors, select majority classification (or somehow combine their classes) • What should K be? • It probability is problem dependent • Can use tuning sets (later) to select a good setting for K Shouldn’t really “connect the dots” (Why?) Tuning Set Error Rate 2 3 4 5 K 1

  25. Some Common Jargon • Classification • Learning a discrete valued function • Regression • Learning a real valued function IBL easily extended to regression tasks (and to multi-category classification) Discrete/Real Outputs

  26. Variations on a Theme (From Aha, Kibler and Albert in ML Journal) • IB1 – keep all examples • IB2 – keep next instance if incorrectly classified by using previous instances • Uses less storage • Order dependent • Sensitive to noisy data

  27. Variations on a Theme (cont.) • IB3– extend IB2 to more intelligently decide which examples to keep (see article) • Better handling of noisy data • Another Idea - cluster groups, keep “examples” from each (median/centroid)

  28. Next time • Finish K-NN • Begin linear separators • Naïve Bayes

More Related