overview of today s lecture n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Overview of Today’s Lecture PowerPoint Presentation
Download Presentation
Overview of Today’s Lecture

Loading in 2 Seconds...

play fullscreen
1 / 28

Overview of Today’s Lecture - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Overview of Today’s Lecture. Last Time: course introduction Reading assignment posted to class webpage Don’t get discouraged Today: introduction to “Supervised Machine Learning” Our first ML algorithm: K-nearest neighbor HW 0 out online Create a dataset of “fixed-length feature vectors”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Overview of Today’s Lecture' - zohar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview of today s lecture
Overview of Today’s Lecture
  • Last Time: course introduction
    • Reading assignment posted to class webpage
    • Don’t get discouraged
  • Today: introduction to “Supervised Machine Learning”
    • Our first ML algorithm: K-nearest neighbor
  • HW 0 out online
    • Create a dataset of
      • “fixed-length feature vectors”
    • Due next Tuesday Sept 19 (4 PM)
    • Instructions for handing in HW0 coming soon
supervised learning overview
Supervised Learning: Overview

Digital Representation

(feature space)

Real World

classification

rules

select

features

construct

classifier

If feature 2 = X

then

APPLY BREAK = TRUE

machine

humans

HW 1-2

HW 0

supervised learning task definition
Supervised Learning: Task Definition
  • Given
    • A collection of positive examples of some concept/class/category (i.e., members of the class) and, possibly, a collection of the negative examples (i.e., non-members)
  • Produce
    • A description that covers (includes) all (most) of the positive examples and none (few) of the negative examples

(which, hopefully, properly categorizes most future examples!)

The Key

Point!

Note: one can easily extend this definition

to handle more than two classes

example
Example

Positive Examples

Negative Examples

How does this symbol classify?

  • Concept
    • Solid Red Circle in a Regular Polygon
  • What about?
    • Figure with red solid circles not in larger red circle
    • Figures on left side of page etc
hw0 your personal concept
HW0 – Your “Personal Concept”
  • Step 1: Choose a Boolean (true/false) concept
    • Subjective judgment (can’t articulate)
      • Books I like/dislike
      • Movies I like/dislike
      • www pages I like/dislike
    • “time will tell” concepts
      • Stocks to buy
      • Medical treatment (at time t, predict outcome at time (t +∆t))
    • Sensory interpretation
      • Face recognition (See text)
      • Handwritten digit recognition
      • Sound recognition
    • Hard to program functions
hw0 your personal concept1
HW0 – Your “Personal Concept”
  • Step 2: Choose a feature space
    • We will use fixed-length feature vectors
      • Choose N features
      • Each feature has Vipossible values
      • Each example is represented by a vector of N feature values

(i.e., is a point in the feature space)

e.g.: <red, 50, round>

colorweight shape

    • Feature Types
      • Boolean
      • Nominal
      • Ordered
      • Hierarchical
  • Step 3: Collect examples (“I/O” pairs)

Defines a space

We will not use hierarchical features

standard feature types for representing training examples source of domain knowledge

closed

polygon

continuous

square

triangle

circle

ellipse

Standard Feature Typesfor representing training examples – source of “domain knowledge”
  • Nominal (Boolean is a special case)
    • No relationship among possible values

e.g., color є {red, blue, green} (vs. color = 1000 Hertz)

  • Linear (or Ordered)
    • Possible values of the feature are totally ordered

e.g., size є {small, medium, large} ←discrete

weight є [0…500] ←continuous

  • Hierarchical
    • Possible values are partiallyordered in an ISA hierarchy

e.g. for shape->

example hierarchy kdd journal vol 5 no 1 2 2001 page 17

Product

Pet

Foods

Tea

99 Product

Classes

2302 Product

Subclasses

Dried

Cat Food

Canned

Cat Food

Friskies

Liver, 250g

~30k

Products

Example Hierarchy (KDD* Journal, Vol 5, No. 1-2, 2001, page 17)
  • Structure of one feature!
  • “the need to be able to incorporate hierarchical (knowledge about data types) is shown in every paper.”
  • - From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001

* Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers

some famous examples

Digitized

camera image

Learned

Function

Steering

Angle

age = 13

sex = M wgt = 18

Learned

Function

ill

vs

healthy

Some Famous Examples
  • Car Steering (Pomerleau)
  • Medical Diagnosis (Quinlan)
  • DNA Categorization
  • TV-pilot rating
  • Chemical-plant control
  • Back gammon playing
  • WWW page scoring
  • Credit application scoring

Medical

record

hw0 creating your dataset
HW0: Creating your dataset
  • Choose a dataset
    • based on interest/familiarity
    • meets basic requirements
      • >1000 examples
      • category (function) learned should be binary valued
      • ~500 “true” and “false” examples

→ Internet Movie Database (IMDb)

example database imdb
Example Database: IMDb
  • Name
  • Country
  • Movies
  • Name
  • Year of birth
  • Movies
  • Name
  • Year of birth
  • Gender
  • Oscars
  • Movies

Studio

Actor

Director/

Producer

Made

Acted in

Directed

Produced

  • Title
  • Genre
  • Year
  • Opening Weekend
  • BO receipts
  • List of actors/actresses
  • Release season

Movie

hw0 creating your dataset1
HW0: Creating your dataset

Choose Boolean target function (category)

  • Some examples:
    • Opening weekend box office receipts > $2 million
    • Movie is drama? (action, sci-fi,…)
    • Movies I like/dislike (e.g. Tivo)
hw0 creating your dataset2
HW0: Creating your dataset
  • Movie
    • Average age of actors
    • Number of producers
    • Percent female actors
  • Studio
    • Number of movies made
    • Average movie gross
    • Percent movies released in US

Create your feature space

  • Director/Producer
    • Years of experience
    • Most prevalent genre
    • Number of award winning movies
    • Average movie gross
  • Actor
    • Gender
    • Has previous Oscar award or nominations
    • Most prevalent genre
hw0 creating your dataset3
HW0: Creating your dataset

David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they selected and a novel way of sampling from the data:

  • Opening weekend box office receipts > $2 million
    • 25 attributes
    • Accuracy = 83.3%
    • Default accuracy = 56%
  • Movie is drama?
    • 12 attributes
    • Accuracy = 71.9%
    • Default accuracy = 51%
  • http://kdl.cs.umass.edu/proximity/about.html
back to supervised learning
Back to Supervised Learning

One way learning systems differ is in how they represent concepts:

Neural

Net

Backpropagation

C4.5, CART

Decision

Tree

Training

Examples

AQ, FOIL

Φ <- X^Y

Φ <- Z

Rules

.

.

.

SVMs

If 5x1 + 9x2 – 3x3 > 12

Then +

feature space
Feature Space

If examples are described in terms of values of features, they can be plotted as points in an N-dimensional space.

Size

Big

?

Color

Gray

2500

Weight

A “concept” is then a (possibly disjoint) volume in this space.

supervised learning learning from labeled examples
Supervised Learning = Learning from Labeled Examples
  • Most common & successful form of ML

Venn Diagram

-

-

-

-

+

+

+

-

+

-

-

-

  • Examples – points in multi-dimensional “feature space”
  • Concepts – “function” that labels points in feature space
    • (as +, -, and possibly ?)
brief review
Brief Review

Instances

  • Conjunctive Concept
    • Color(?obj1, red)

^

    • Size(?obj1, large)
  • Disjunctive Concept
    • Color(?obj2, blue)

v

    • Size(?obj2, small)

“and”

“or”

A

A

A

empirical learning and venn diagrams
Empirical Learning and Venn Diagrams

Venn Diagram

Concept = A or B (Disjunctive concept)

Examples = labeled points in feature space

Concept = a label for a set of points

-

-

-

-

-

-

-

-

+

+

-

-

-

-

+

-

-

+

-

+

-

+

+

+

+

+

+

+

+

+

+

-

-

-

-

-

A

-

-

-

+

+

+

-

+

-

B

-

-

-

-

-

-

-

-

Feature Space

aspects of an ml system
Aspects of an ML System
  • “Language” for representing examples
  • “Language” for representing “Concepts”
  • Technique for producing concept “consistent” with the training examples
  • Technique for classifying new instance

Each of these limits the expressiveness/efficiency of the supervised learning algorithm.

HW 0

Other

HW’s

nearest neighbor algorithms
Nearest-Neighbor Algorithms

(aka. Exemplar models, instance-based learning (IBL), case-based learning)

  • Learning ≈ memorize training examples
  • Problem solving = find most similar example in memory; output its category

Venn

-

-

+

+

+

+

“Voronoi

Diagrams”

(pg 233)

+

-

-

-

-

+

-

+

+

+

+

?

-

sample experimental results
Sample Experimental Results

Simple algorithm works quite well!

simple example 1 nn
Simple Example – 1-NN

(1-NN ≡one nearest neighbor)

Training Set

  • a=0, b=0, c=1+
  • a=0, b=1, c=0-
  • a=1, b=1, c=1-

Test Example

  • a=0, b=1, c=0 ?
  • “Hamming Distance”
  • Ex 1 = 2
  • Ex 2 = 1
  • Ex 3 = 2

So output -

k nn algorithm
K-NN Algorithm

Collect K nearest neighbors, select majority classification (or somehow combine their classes)

  • What should K be?
    • It probability is problem dependent
    • Can use tuning sets (later) to select a good setting for K

Shouldn’t really

“connect the dots”

(Why?)

Tuning Set

Error Rate

2

3

4

5

K

1

some common jargon
Some Common Jargon
  • Classification
    • Learning a discrete valued function
  • Regression
    • Learning a real valued function

IBL easily extended to regression tasks (and to multi-category classification)

Discrete/Real

Outputs

variations on a theme
Variations on a Theme

(From Aha, Kibler and Albert in ML Journal)

  • IB1 – keep all examples
  • IB2 – keep next instance if incorrectly classified by using previous instances
    • Uses less storage
    • Order dependent
    • Sensitive to noisy data
variations on a theme cont
Variations on a Theme (cont.)
  • IB3– extend IB2 to more intelligently decide which examples to keep (see article)
    • Better handling of noisy data
  • Another Idea - cluster groups, keep “examples” from each (median/centroid)
next time
Next time
  • Finish K-NN
  • Begin linear separators
    • Naïve Bayes