**1. **Lecture 3: Introduction to Classification CS 175, Fall 2007
Padhraic Smyth
Department of Computer Science
University of California, Irvine

**2. **Outline Overview of Classification:
examples and applications of classification
classification: mapping from features to a class label
decision boundaries
training and test data accuracy
the nearest-neighbor classifier
Assignments:
Assignment 2 due Wednesday next week
plotting classification data
k-nearest-neighbor classifiers

**3. **Classification Classification is an important component of intelligent systems
We have a special discrete-valued variable called the class, C
C takes values c, where c = 1, c = 2, ?., c = m
for now assume m=2, i.e., 2 classes: c = 1 or c = 2
Problem is to decide what class an object is
i.e., what value the class variable C is for a given object
given measurements on the object, e.g., A, B, ?.
These measurements are called ?features?
we wish to learn a mapping from Features -> Class
Notation:
C is the class
A, B, etc (the measurements) are called the ?features? (sometimes also called ?attributes? or ?input variables?)

**4. ** Classification Functions

**5. **Applications of Classification Medical Diagnosis
classification of cancerous cells
Credit card and Loan approval
Most major banks
Speech recognition
IBM, Dragon Systems, AT&T, Microsoft, etc
Optical Character/Handwriting Recognition
Post Offices, Banks, Gateway, Motorola, Microsoft, Xerox, etc
Email classification
classify email as ?junk? or ?non-junk?
Many other applications
one of the most successful applications of AI technology

**6. ** Examples of Features and Classes

**7. ** Examples of Features and Classes

**8. ** Examples of Features and Classes

**9. **Classification of Galaxies

**12. **Feature Vectors and Feature Spaces Feature Vector:
Say we have 2 features: we can think of the features as a 2-component vector
i.e., a 2-dimensional vector, [a b]
So the features correspond to a 2-dimensional space
(clearly we can generalize to d-dimensional space)
this is called the ?feature space?
Each feature vector represents the ?coordinates? of a particular object in feature space
If the feature-space is 2-dimensional (for example), and the features a and b are real-valued
we can visually examine and plot the locations of the feature vectors

**13. **Data with 2 Features

**14. **Data from Multiple Classes Now consider that we have data from m classes (e.g., m=2)
We can imagine the data from each class being in a ?cloud? in feature space
data sets D1 and D2 (sets of points from classes 1 and 2)
data are of dimension d (i.e., d-dimensional vectors)
if d = 2 (2 features), we can plot the data
we should see two ?clouds? of data points, one cloud per class

**15. **Example of Data from 2 Classes

**17. **Decision Boundaries What is a Classifier?
A classifier is a mapping from feature space (a d-dimensional vector) to the class labels {1, 2, ? m}
Thus, a classifier partitions the feature space into m decision regions
The line or surface separating any 2 classes is the decision boundary
Linear Classifiers
a linear classifier is a mapping which partitions feature space using a linear function (a straight line, or a hyperplane)
it is one of the simplest classifiers we can imagine
in 2 dimensions the decision boundary is a straight line

**18. **2-Class Data with a Linear Decision Boundary

**19. **Class Overlap Consider two class case
data from D1 and D2 may overlap
features = {age, body temperature}, classes = {flu, not-flu}
features = {income, savings}, classes = {good/bad risk}
common in practice that the classes will naturally overlap
this means that our features are usually not able to perfectly discriminate between the classes
note: with more expensive/more detailed additional features (e.g., a specific test for the flu) we might be able to get perfect separation
if there is overlap => classes are not linearly separable

**20. **Classification Problem with Overlap

**24. **Classification Accuracy Say we have N feature vectors
Say we know the true class label for each feature vector
We can measure how accurate a classifier is by how many feature vectors it classifies correctly
Accuracy = percentage of feature vectors correctly classified
training accuracy = accuracy on training data
test accuracy = accuracy on new data not used in training

**25. **Training Data and Test Data Training data
labeled data used to build a classifier
Test data
new data, not used in the training process, to evaluate how well a classifier does on new data
Memorization versus Generalization
better training_accuracy
?memorizing? the training data:
better test_accuracy
?generalizing? to new data
in general, we would like our classifier to perform well on new test data, not just on training data,
i.e., we would like it to generalize well to new data
Test accuracy is more important than training accuracy

**26. **Examples of Training and Test Data Speech Recognition
Training data
words recorded and labeled in a laboratory
Test data
words recorded from new speakers, new locations
Zipcode Recognition
Training data
zipcodes manually selected, scanned, labeled
Test data
actual letters being scanned in a post office
Credit Scoring
Training data
historical database of loan applications with payment history or decision at that time
Test data
you

**27. **Some Notation
Training Data
Dtrain = { [x(1), c(1)] , [x(2), c(2)] , ????[x(N), c(N)] }
N pairs of feature vectors and class labels
Feature Vectors and Class Labels:
x(i) is the ith training data feature vector
in MATLAB this could be the ith row of an N x d matrix
c(i) is the class label of the ith feature vector
in general, c(i) can take m different class values, e.g., c = 1, c = 2, ...
Let y be a new feature vector whose class label we do not know, i.e., we wish to classify it.

**28. **Nearest Neighbor Classifier y is a new feature vector whose class label is unknown
Search Dtrain for the closest feature vector to y
let this ?closest feature vector? be x(j)
Classify y with the same label as x(j), i.e.
y is assigned label c(j)
How are ?closest x? vectors determined?
typically use minimum Euclidean distance
dE(x, y) = sqrt(S (xi - yi)2 )
Side note: this produces a ?Voronoi tesselation? of the d-space
each point ?claims? a cell surrounding it
cell boundaries are polygons
Analogous to ?memory-based? reasoning in humans

**29. **Geometric Interpretation of Nearest Neighbor

**30. **Regions for Nearest Neighbors

**31. **Nearest Neighbor Decision Boundary

**32. **How should the new point be classified?

**33. **Local Decision Boundaries

**34. **Finding the Decision Boundaries

**35. **Finding the Decision Boundaries

**36. **Finding the Decision Boundaries

**37. **Overall Boundary = Piecewise Linear

**38. **Geometric Interpretation of kNN (k=1)

**39. **More Data Points

**40. **More Complex Decision Boundary

**41. **K-Nearest Neighbor (kNN) Classifier Find the k-nearest neighbors to y in Dtrain
i.e., rank the feature vectors according to Euclidean distance
select the k vectors which are have smallest distance to y
Classification
ranking yields k feature vectors and a set of k class labels
pick the class label which is most common in this set (?vote?)
classify y as belonging to this class
Theoretical Considerations
as k increases
we are averaging over more neighbors
the effective decision boundary is more ?smooth?
as N increases, the optimal k value tends to increase in proportion to log N

**42. **K-Nearest Neighbor (kNN) Classifier Notes:
In effect, the classifier uses the nearest k feature vectors from Dtrain to ?vote? on the class label for y
the single-nearest neighbor classifier is the special case of k=1
for two-class problems, if we choose k to be odd (i.e., k=1, 3, 5,?) then there will never be any ?ties?
?training? is trivial for the kNN classifier, i.e., we just use Dtrain as a ?lookup table? when we want to classify a new feature vector
Extensions of the Nearest Neighbor classifier
weighted distances
e.g., if some of the features are more important
e.g., if features are irrelevant
fast search techniques (indexing) to find k-nearest neighbors in d-space

**43. **Accuracy on Training Data

**44. **Accuracy on Test Data

**45. **Assignment 2 Due Wednesday?..
4 parts
Plot classification data in two-dimensions
Implement a nearest-neighbor classifier
Plot the errors of a k-nearest-neighbor classifier
Test the effect of the value k on the accuracy of the classifier

**46. **Data Structure simdata1 =
shortname: 'Simulated Data 1'
numfeatures: 2
classnames: [2x6 char]
numclasses: 2
description: [1x66 char]
features: [200x2 double]
classlabels: [200x1 double]

**47. **Plotting Function

**48. **First simulated data set, simdata1

**49. **Second simulated data set, simdata2

**50. **Nearest Neighbor Classifier

**51. **Plotting k-NN Errors

**52. **Accuracy of kNN Classifier as k is varied

**53. **Test Accuracy and Generalization The accuracy of our classifier on new unseen data is a fair/honest assessment of the performance of our classifier
Why is training accuracy not good enough?
Training accuracy is optimistic
a classifier like nearest-neighbor can construct boundaries which always separate all training data points, but which do not separate new points
e.g., what is the training accuracy of kNN, k = 1?
A flexible classifier can ?overfit? the training data
in effect it just memorizes the training data, but does not learn the general relationship between x and C
Generalization
We are really interested in how our classifier generalizes to new data
test data accuracy is a good estimate of generalization performance

**54. **Another Example

**55. **A More Complex Decision Boundary

**56. **Example: The Overfitting Phenomenon

**57. **A Complex Model

**58. **The True (simpler) Model

**59. **How Overfitting affects Prediction

**60. **How Overfitting affects Prediction

**61. **How Overfitting affects Prediction

**62. **Summary
Important Concepts
classification is an important component in intelligent systems
a classifier = a mapping from feature space to a class label
decision boundaries = boundaries between classes
classification learning
using training data to define a classifier
the nearest-neighbor classifier
training accuracy versus test accuracy