Lecture 3: Introduction to Classification. CS 175, Fall 2007 Padhraic Smyth Department of Computer Science University of California, Irvine. Outline. Overview of Classification: examples and applications of classification classification: mapping from features to a class label - PowerPoint PPT Presentation
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lecture 3: Introduction to Classification
CS 175, Fall 2007
Padhraic Smyth
Department of Computer Science
University of California, Irvine
Feature Values (which
are known, measured)
Predicted Class Value
(true class is unknown
to the classifier)
a
b
c
Classifier
d
z
We want a mapping or function which takes any combination of
values x = (a, b, d, ..... z) and will produce a prediction c,
i.e., a function c = f(a, b, d, …. z) which produces a value c=1, c=2,…c=m
The problem is that we don’t know this mapping: we have to learn it from data!
Class 2
Class 1
Control Group
Anemia Group
1
2
Feature 2
1
2
2
1
Feature 1
Each data point defines a “cell” of space that is closest to it. All points within that cell are assigned that class
1
2
Feature 2
1
2
2
1
Feature 1
Overall decision boundary = union
of cell boundaries where class
decision is different on each side
1
2
Feature 2
1
2
2
1
Feature 1
1
2
Feature 2
1
2
?
2
1
Feature 1
Boundary? Points that are equidistant
between points of class 1 and 2
Note: locally the boundary is
(1) linear (because of Euclidean distance)
(2) halfway between the 2 class points
(3) at right angles to connector
1
2
Feature 2
1
2
?
2
1
Feature 1
1
2
Feature 2
1
2
?
2
1
Feature 1
1
2
Feature 2
1
2
?
2
1
Feature 1
1
2
Feature 2
1
2
?
2
1
Feature 1
Decision Region
for Class 1
Decision Region
for Class 2
1
2
Feature 2
1
2
?
2
1
Feature 1
?
1
2
Feature 2
1
2
2
1
Feature 1
Feature 2
1
1
1
2
2
1
1
2
2
1
2
1
1
2
2
2
Feature 1
1
In general:
Nearest-neighbor classifier
produces piecewise linear
decision boundaries
1
1
Feature 2
2
2
1
1
2
2
1
2
1
1
2
2
2
Feature 1
Training Accuracy = 1/n SDtrain I( o(i), c(i) )
where I( o(i), c(i) ) = 1 if o(i) = c(i), and 0 otherwise
Where o(i) = the output of the classifier for training feature x(i)
c(i) is the true class for training data vector x(i)
Let Dtest be a set of new data, unseen in the training process: but
assume that Dtest is being generated by the same “mechanism” as generated Dtrain:
Test Accuracy = 1/ntestSDtest I( o(j), c(j) )
Test Accuracy is usually what we are really interested in: why?
Unfortunately test accuracy is often lower on average than train accuracy
Why is this so?
simdata1 =
shortname: 'Simulated Data 1'
numfeatures: 2
classnames: [2x6 char]
numclasses: 2
description: [1x66 char]
features: [200x2 double]
classlabels: [200x1 double]
function classplot(data, x, y);
% function classplot(data, x, y);
%
% brief description of what the function does
% ......
% Your Name, CS 175, date
%
% Inputs
% data: (a structure with the same fields as described above:
% your comment header should describe the structure explicitly)
% Note that if you are only using certain fields in the structure
% in the function below, you need only define these fields in the input comments
-------- Your code goes here -------
function [class_predictions] = knn(traindata,trainlabels,k, testdata)
% function [class_predictions] = knn(traindata,trainlabels,k, testdata)
%
% a brief description of what the function does
% ......
% Your Name, CS 175, date
%
% Inputs
% traindata: a N1 x d vector of feature data (the "memory" for kNN)
% trainlabels: a N1 x 1 vector of classlabels for traindata
% k: an odd positive integer indicating the number of neighbors to use
% testdata: a N2 x d vector of feature data for testing the knn classifier
%
% Outputs
% class_predictions: N2 x 1 vector of predicted class values
-------- Your code goes here -------
function knn_plot(traindata,trainlabels,k,testdata,testlabels);
% function knn_plot(traindata,trainlabels,k,testdata,testlabels);
%
% Predicts class-labels for the data in testdata using the k nearest
% neighbors in traindata, and then plots the data (using the first
% 2 dimensions or first 2 features), displaying the data from each
% class in different colors, and overlaying circles on the points
% that were incorrectly classified.
%
% Inputs
% traindata: a N1 x d vector of feature data (the "memory" for kNN)
% trainlabels: a N1 x 1 vector of classlabels for traindata
% k: an odd positive integer indicating the number of neighbors to use
% testdata: a N2 x d vector of feature data for testing the knn classifier
% trainlabels: a N2 x 1 vector of classlabels for traindata
function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)
% function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)
%
% a brief description of what the function does
% ......
% Your Name, CS 175, date
%
% Inputs
% traindata: a N1 x d vector of feature data (the "memory" for kNN)
% trainlabels: a N1 x 1 vector of classlabels for traindata
% testdata: a N2 x d vector of feature data for testing the knn classifier
% testlabels: a N2 x 1 vector of classlabels for traindata
% kmax: an odd positive integer indicating the maximum number of neighbors
% plotflag: (optional argument) if 1, the error-rates versus k is plotted,
% otherwise no plot.
%
% Outputs
% errors: r x 1 vector of error-rates on testdata, where r is the
% number of values of k that are tested.
-------- Your code goes here -------
6
5
4
3
2
1
0
-1
2
3
4
5
6
7
8
9
10
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
Decision
Region 1
Decision
Region 2
Feature 2
Decision
Boundary
Feature 1
Y
X
Y = high-order polynomial in X
Y
X
Y = a X + b + noise
Y
X
Predictive
Error
Error on Training Data
Model Complexity
Predictive
Error
Error on Test Data
Error on Training Data
Model Complexity
Predictive
Error
Error on Test Data
Error on Training Data
Model Complexity
Ideal Range
for Model Complexity
Overfitting
Underfitting