Overview of Today’s Lecture. Last Time: course introduction Reading assignment posted to class webpage Don’t get discouraged Today: introduction to “Supervised Machine Learning” Our first ML algorithm: Knearest neighbor HW 0 out online Create a dataset of “fixedlength feature vectors”
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Digital Representation
(feature space)
Real World
classification
rules
select
features
construct
classifier
If feature 2 = X
then
APPLY BREAK = TRUE
machine
humans
HW 12
HW 0
(which, hopefully, properly categorizes most future examples!)
The Key
Point!
Note: one can easily extend this definition
to handle more than two classes
Positive Examples
Negative Examples
How does this symbol classify?
(i.e., is a point in the feature space)
e.g.: <red, 50, round>
colorweight shape
Defines a space
We will not use hierarchical features
polygon
continuous
square
triangle
circle
ellipse
Standard Feature Typesfor representing training examples – source of “domain knowledge”e.g., color є {red, blue, green} (vs. color = 1000 Hertz)
e.g., size є {small, medium, large} ←discrete
weight є [0…500] ←continuous
e.g. for shape>
Pet
Foods
Tea
99 Product
Classes
2302 Product
Subclasses
Dried
Cat Food
Canned
Cat Food
Friskies
Liver, 250g
~30k
Products
Example Hierarchy (KDD* Journal, Vol 5, No. 12, 2001, page 17)* Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers
camera image
Learned
Function
Steering
Angle
age = 13
sex = M wgt = 18
Learned
Function
ill
vs
healthy
Some Famous ExamplesMedical
record
→ Internet Movie Database (IMDb)
Studio
Actor
Director/
Producer
Made
Acted in
Directed
Produced
Movie
Choose Boolean target function (category)
Create your feature space
David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they selected and a novel way of sampling from the data:
One way learning systems differ is in how they represent concepts:
Neural
Net
Backpropagation
C4.5, CART
Decision
Tree
Training
Examples
AQ, FOIL
Φ < X^Y
Φ < Z
Rules
.
.
.
SVMs
If 5x1 + 9x2 – 3x3 > 12
Then +
If examples are described in terms of values of features, they can be plotted as points in an Ndimensional space.
Size
Big
?
Color
Gray
2500
Weight
A “concept” is then a (possibly disjoint) volume in this space.
Venn Diagram




+
+
+

+



Instances
^
v
“and”
“or”
A
A
A
Venn Diagram
Concept = A or B (Disjunctive concept)
Examples = labeled points in feature space
Concept = a label for a set of points








+
+




+


+

+

+
+
+
+
+
+
+
+
+
+





A



+
+
+

+

B








Feature Space
Each of these limits the expressiveness/efficiency of the supervised learning algorithm.
HW 0
Other
HW’s
(aka. Exemplar models, instancebased learning (IBL), casebased learning)
Venn


+
+
+
+
“Voronoi
Diagrams”
(pg 233)
+

…



+

+
+
+
+
?

Simple algorithm works quite well!
(1NN ≡one nearest neighbor)
Training Set
Test Example
So output 
Collect K nearest neighbors, select majority classification (or somehow combine their classes)
Shouldn’t really
“connect the dots”
(Why?)
Tuning Set
Error Rate
2
3
4
5
K
1
IBL easily extended to regression tasks (and to multicategory classification)
Discrete/Real
Outputs
(From Aha, Kibler and Albert in ML Journal)