Download

Lecture 3: Introduction to Classification






Advertisement
/ 62 []
Download Presentation
Comments
halona
From:
|  
(1120) |   (0) |   (0)
Views: 49 | Added:
Rate Presentation: 0 0
Description:
Lecture 3: Introduction to Classification. CS 175, Fall 2007 Padhraic Smyth Department of Computer Science University of California, Irvine. Outline. Overview of Classification: examples and applications of classification classification: mapping from features to a class label
Lecture 3: Introduction to Classification

An Image/Link below is provided (as is) to

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime. While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.











- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -




Slide 1

Lecture 3: Introduction to Classification

CS 175, Fall 2007

Padhraic Smyth

Department of Computer Science

University of California, Irvine

Slide 2

Outline

  • Overview of Classification:

    • examples and applications of classification

    • classification: mapping from features to a class label

    • decision boundaries

    • training and test data accuracy

    • the nearest-neighbor classifier

  • Assignments:

    • Assignment 2 due Wednesday next week

      • plotting classification data

      • k-nearest-neighbor classifiers

Slide 3

Classification

  • Classification is an important component of intelligent systems

  • We have a special discrete-valued variable called the class, C

    • C takes values c, where c = 1, c = 2, …., c = m

    • for now assume m=2, i.e., 2 classes: c= 1 or c= 2

  • Problem is to decide what class an object is

    • i.e., what value the class variable C is for a given object

    • given measurements on the object, e.g., A, B, ….

    • These measurements are called “features”

    • we wish to learn a mapping from Features -> Class

  • Notation:

    • C is the class

    • A, B, etc (the measurements) are called the “features” (sometimes also called “attributes” or “input variables”)

Slide 4

Classification Functions

Feature Values (which

are known, measured)

Predicted Class Value

(true class is unknown

to the classifier)

a

b

c

Classifier

d

z

We want a mapping or function which takes any combination of

values x = (a, b, d, ..... z) and will produce a prediction c,

i.e., a function c = f(a, b, d, …. z) which produces a value c=1, c=2,…c=m

The problem is that we don’t know this mapping: we have to learn it from data!

Slide 5

Applications of Classification

  • Medical Diagnosis

    • classification of cancerous cells

  • Credit card and Loan approval

    • Most major banks

  • Speech recognition

    • IBM, Dragon Systems, AT&T, Microsoft, etc

  • Optical Character/Handwriting Recognition

    • Post Offices, Banks, Gateway, Motorola, Microsoft, Xerox, etc

  • Email classification

    • classify email as “junk” or “non-junk”

  • Many other applications

    • one of the most successful applications of AI technology

Slide 6

Examples of Features and Classes

Slide 7

Examples of Features and Classes

Slide 8

Examples of Features and Classes

Slide 9

Classification of Galaxies

Class 2

Class 1

Slide 12

Feature Vectors and Feature Spaces

  • Feature Vector:

    • Say we have 2 features: we can think of the features as a 2-component vector

      • i.e., a 2-dimensional vector, [a b]

    • So the features correspond to a 2-dimensional space

      • (clearly we can generalize to d-dimensional space)

      • this is called the “feature space”

    • Each feature vector represents the “coordinates” of a particular object in feature space

    • If the feature-space is 2-dimensional (for example), and the features a and b are real-valued

      • we can visually examine and plot the locations of the feature vectors

Slide 13

Data with 2 Features

Slide 14

Data from Multiple Classes

  • Now consider that we have data from m classes (e.g., m=2)

  • We can imagine the data from each class being in a “cloud” in feature space

    • data sets D1 and D2 (sets of points from classes 1 and 2)

    • data are of dimension d (i.e., d-dimensional vectors)

    • if d = 2 (2 features), we can plot the data

      • we should see two “clouds” of data points, one cloud per class

Slide 15

Example of Data from 2 Classes

Slide 16

Control Group

Anemia Group

Slide 17

Decision Boundaries

  • What is a Classifier?

    • A classifier is a mapping from feature space (a d-dimensional vector) to the class labels {1, 2, … m}

    • Thus, a classifier partitions the feature space into m decision regions

    • The line or surface separating any 2 classes is the decision boundary

  • Linear Classifiers

    • a linear classifier is a mapping which partitions feature space using a linear function (a straight line, or a hyperplane)

    • it is one of the simplest classifiers we can imagine

    • in 2 dimensions the decision boundary is a straight line

Slide 18

2-Class Data with a Linear Decision Boundary

Slide 19

Class Overlap

  • Consider two class case

    • data from D1 and D2 may overlap

      • features = {age, body temperature}, classes = {flu, not-flu}

      • features = {income, savings}, classes = {good/bad risk}

    • common in practice that the classes will naturally overlap

      • this means that our features are usually not able to perfectly discriminate between the classes

      • note: with more expensive/more detailed additional features (e.g., a specific test for the flu) we might be able to get perfect separation

    • if there is overlap => classes are not linearly separable

Slide 20

Classification Problem with Overlap

Slide 24

Classification Accuracy

  • Say we have N feature vectors

  • Say we know the true class label for each feature vector

  • We can measure how accurate a classifier is by how many feature vectors it classifies correctly

  • Accuracy = percentage of feature vectors correctly classified

    • training accuracy = accuracy on training data

    • test accuracy = accuracy on new data not used in training

Slide 25

Training Data and Test Data

  • Training data

    • labeled data used to build a classifier

  • Test data

    • new data, not used in the training process, to evaluate how well a classifier does on new data

  • Memorization versus Generalization

    • better training_accuracy

      • “memorizing” the training data:

    • better test_accuracy

      • “generalizing” to new data

    • in general, we would like our classifier to perform well on new test data, not just on training data,

      • i.e., we would like it to generalize well to new data

      • Test accuracy is more important than training accuracy

Slide 26

Examples of Training and Test Data

  • Speech Recognition

    • Training data

      • words recorded and labeled in a laboratory

  • Test data

    • words recorded from new speakers, new locations

  • Zipcode Recognition

    • Training data

      • zipcodes manually selected, scanned, labeled

  • Test data

    • actual letters being scanned in a post office

  • Credit Scoring

    • Training data

      • historical database of loan applications with payment history or decision at that time

  • Test data

    • you

  • Slide 27

    Some Notation

    • Training Data

      • Dtrain = { [x(1), c(1)] , [x(2), c(2)] , …………[x(N), c(N)] }

      • N pairs of feature vectors and class labels

    • Feature Vectors and Class Labels:

      • x(i) is the ith training data feature vector

      • in MATLAB this could be the ith row of an N x d matrix

      • c(i) is the class label of the ith feature vector

      • in general, c(i) can take m different class values, e.g., c = 1, c = 2, ...

      • Let y be a new feature vector whose class label we do not know, i.e., we wish to classify it.

    Slide 28

    Nearest Neighbor Classifier

    • y is a new feature vector whose class label is unknown

    • Search Dtrain for the closest feature vector to y

      • let this “closest feature vector” be x(j)

    • Classify y with the same label as x(j), i.e.

      • y is assigned label c(j)

    • How are “closest x” vectors determined?

      • typically use minimum Euclidean distance

        • dE(x, y) = sqrt(S (xi - yi)2)

    • Side note: this produces a “Voronoi tesselation” of the d-space

      • each point “claims” a cell surrounding it

      • cell boundaries are polygons

  • Analogous to “memory-based” reasoning in humans

  • Slide 29

    Geometric Interpretation of Nearest Neighbor

    1

    2

    Feature 2

    1

    2

    2

    1

    Feature 1

    Slide 30

    Regions for Nearest Neighbors

    Each data point defines a “cell” of space that is closest to it. All points within that cell are assigned that class

    1

    2

    Feature 2

    1

    2

    2

    1

    Feature 1

    Slide 31

    Nearest Neighbor Decision Boundary

    Overall decision boundary = union

    of cell boundaries where class

    decision is different on each side

    1

    2

    Feature 2

    1

    2

    2

    1

    Feature 1

    Slide 32

    How should the new point be classified?

    1

    2

    Feature 2

    1

    2

    ?

    2

    1

    Feature 1

    Slide 33

    Local Decision Boundaries

    Boundary? Points that are equidistant

    between points of class 1 and 2

    Note: locally the boundary is

    (1) linear (because of Euclidean distance)

    (2) halfway between the 2 class points

    (3) at right angles to connector

    1

    2

    Feature 2

    1

    2

    ?

    2

    1

    Feature 1

    Slide 34

    Finding the Decision Boundaries

    1

    2

    Feature 2

    1

    2

    ?

    2

    1

    Feature 1

    Slide 35

    Finding the Decision Boundaries

    1

    2

    Feature 2

    1

    2

    ?

    2

    1

    Feature 1

    Slide 36

    Finding the Decision Boundaries

    1

    2

    Feature 2

    1

    2

    ?

    2

    1

    Feature 1

    Slide 37

    Overall Boundary = Piecewise Linear

    Decision Region

    for Class 1

    Decision Region

    for Class 2

    1

    2

    Feature 2

    1

    2

    ?

    2

    1

    Feature 1

    Slide 38

    Geometric Interpretation of kNN (k=1)

    ?

    1

    2

    Feature 2

    1

    2

    2

    1

    Feature 1

    Slide 39

    More Data Points

    Feature 2

    1

    1

    1

    2

    2

    1

    1

    2

    2

    1

    2

    1

    1

    2

    2

    2

    Feature 1

    Slide 40

    More Complex Decision Boundary

    1

    In general:

    Nearest-neighbor classifier

    produces piecewise linear

    decision boundaries

    1

    1

    Feature 2

    2

    2

    1

    1

    2

    2

    1

    2

    1

    1

    2

    2

    2

    Feature 1

    Slide 41

    K-Nearest Neighbor (kNN) Classifier

    • Find the k-nearest neighbors to y in Dtrain

      • i.e., rank the feature vectors according to Euclidean distance

      • select the k vectors which are have smallest distance to y

    • Classification

      • ranking yields k feature vectors and a set of k class labels

      • pick the class label which is most common in this set (“vote”)

      • classify y as belonging to this class

    • Theoretical Considerations

      • as k increases

        • we are averaging over more neighbors

        • the effective decision boundary is more “smooth”

      • as N increases, the optimal k value tends to increase in proportion to log N

    Slide 42

    K-Nearest Neighbor (kNN) Classifier

    • Notes:

      • In effect, the classifier uses the nearest k feature vectors from Dtrain to “vote” on the class label for y

      • the single-nearest neighbor classifier is the special case of k=1

      • for two-class problems, if we choose k to be odd (i.e., k=1, 3, 5,…) then there will never be any “ties”

      • “training” is trivial for the kNN classifier, i.e., we just use Dtrain as a “lookup table” when we want to classify a new feature vector

    • Extensions of the Nearest Neighbor classifier

      • weighted distances

        • e.g., if some of the features are more important

        • e.g., if features are irrelevant

      • fast search techniques (indexing) to find k-nearest neighbors in d-space

    Slide 43

    Accuracy on Training Data

    Training Accuracy = 1/n SDtrain I( o(i), c(i) )

    where I( o(i), c(i) ) = 1 if o(i) = c(i), and 0 otherwise

    Where o(i) = the output of the classifier for training feature x(i)

    c(i) is the true class for training data vector x(i)

    Slide 44

    Accuracy on Test Data

    Let Dtest be a set of new data, unseen in the training process: but

    assume that Dtest is being generated by the same “mechanism” as generated Dtrain:

    Test Accuracy = 1/ntestSDtest I( o(j), c(j) )

    Test Accuracy is usually what we are really interested in: why?

    Unfortunately test accuracy is often lower on average than train accuracy

    Why is this so?

    Slide 45

    Assignment 2

    • Due Wednesday…..

    • 4 parts

      • Plot classification data in two-dimensions

      • Implement a nearest-neighbor classifier

      • Plot the errors of a k-nearest-neighbor classifier

      • Test the effect of the value k on the accuracy of the classifier

    Slide 46

    Data Structure

    simdata1 =

    shortname: 'Simulated Data 1'

    numfeatures: 2

    classnames: [2x6 char]

    numclasses: 2

    description: [1x66 char]

    features: [200x2 double]

    classlabels: [200x1 double]

    Slide 47

    Plotting Function

    function classplot(data, x, y);

    % function classplot(data, x, y);

    %

    % brief description of what the function does

    % ......

    % Your Name, CS 175, date

    %

    % Inputs

    % data: (a structure with the same fields as described above:

    % your comment header should describe the structure explicitly)

    % Note that if you are only using certain fields in the structure

    % in the function below, you need only define these fields in the input comments

    -------- Your code goes here -------

    Slide 48

    First simulated data set, simdata1

    Slide 49

    Second simulated data set, simdata2

    Slide 50

    Nearest Neighbor Classifier

    function [class_predictions] = knn(traindata,trainlabels,k, testdata)

    % function [class_predictions] = knn(traindata,trainlabels,k, testdata)

    %

    % a brief description of what the function does

    % ......

    % Your Name, CS 175, date

    %

    % Inputs

    % traindata: a N1 x d vector of feature data (the "memory" for kNN)

    % trainlabels: a N1 x 1 vector of classlabels for traindata

    % k: an odd positive integer indicating the number of neighbors to use

    % testdata: a N2 x d vector of feature data for testing the knn classifier

    %

    % Outputs

    % class_predictions: N2 x 1 vector of predicted class values

    -------- Your code goes here -------

    Slide 51

    Plotting k-NN Errors

    function knn_plot(traindata,trainlabels,k,testdata,testlabels);

    % function knn_plot(traindata,trainlabels,k,testdata,testlabels);

    %

    % Predicts class-labels for the data in testdata using the k nearest

    % neighbors in traindata, and then plots the data (using the first

    % 2 dimensions or first 2 features), displaying the data from each

    % class in different colors, and overlaying circles on the points

    % that were incorrectly classified.

    %

    % Inputs

    % traindata: a N1 x d vector of feature data (the "memory" for kNN)

    % trainlabels: a N1 x 1 vector of classlabels for traindata

    % k: an odd positive integer indicating the number of neighbors to use

    % testdata: a N2 x d vector of feature data for testing the knn classifier

    % trainlabels: a N2 x 1 vector of classlabels for traindata

    Slide 52

    Accuracy of kNN Classifier as k is varied

    function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)

      % function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)

      %

      %  a brief description of what the function does

      %  ......

      %                            Your Name, CS 175, date

      %

      %  Inputs

      %     traindata: a N1 x d vector of feature data (the "memory" for kNN)

      %     trainlabels: a N1 x 1 vector of classlabels for traindata

      %     testdata: a N2 x d vector of feature data for testing the knn classifier 

      %     testlabels: a N2 x 1 vector of classlabels for traindata

      %     kmax: an odd positive integer indicating the maximum number of neighbors 

      %     plotflag: (optional argument) if 1, the error-rates versus k is plotted,

      %                                   otherwise no plot.

      %

      %  Outputs

      %     errors: r x 1 vector of error-rates on testdata, where r is the

      %                 number of values of k that are tested.

          --------   Your code goes here -------

    Slide 53

    Test Accuracy and Generalization

    • The accuracy of our classifier on new unseen data is a fair/honest assessment of the performance of our classifier

    • Why is training accuracy not good enough?

      • Training accuracy is optimistic

      • a classifier like nearest-neighbor can construct boundaries which always separate all training data points, but which do not separate new points

        • e.g., what is the training accuracy of kNN, k = 1?

      • A flexible classifier can “overfit” the training data

        • in effect it just memorizes the training data, but does not learn the general relationship between x and C

    • Generalization

      • We are really interested in how our classifier generalizes to new data

      • test data accuracy is a good estimate of generalization performance

    Slide 54

    Another Example

    Slide 55

    6

    5

    4

    3

    2

    1

    0

    -1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    A More Complex Decision Boundary

    TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE

    Decision

    Region 1

    Decision

    Region 2

    Feature 2

    Decision

    Boundary

    Feature 1

    Slide 56

    Example: The Overfitting Phenomenon

    Y

    X

    Slide 57

    A Complex Model

    Y = high-order polynomial in X

    Y

    X

    Slide 58

    The True (simpler) Model

    Y = a X + b + noise

    Y

    X

    Slide 59

    How Overfitting affects Prediction

    Predictive

    Error

    Error on Training Data

    Model Complexity

    Slide 60

    How Overfitting affects Prediction

    Predictive

    Error

    Error on Test Data

    Error on Training Data

    Model Complexity

    Slide 61

    How Overfitting affects Prediction

    Predictive

    Error

    Error on Test Data

    Error on Training Data

    Model Complexity

    Ideal Range

    for Model Complexity

    Overfitting

    Underfitting

    Slide 62

    Summary

    • Important Concepts

      • classification is an important component in intelligent systems

      • a classifier = a mapping from feature space to a class label

        • decision boundaries = boundaries between classes

      • classification learning

        • using training data to define a classifier

      • the nearest-neighbor classifier

      • training accuracy versus test accuracy


    Copyright © 2014 SlideServe. All rights reserved | Powered By DigitalOfficePro