Lecture 3: Introduction to Classification

Lecture 3: Introduction to Classification

Download Presentation
(127) |   (0) |   (0)
Views: 39 | Added: 24-10-2012
Rate Presentation: 0 0
Outline. Overview of Classification: examples and applications of classificationclassification: mapping from features to a class labeldecision boundariestraining and test data accuracythe nearest-neighbor classifier Assignments: Assignment 2 due Wednesday next week plotting classification
Lecture 3: Introduction to Classification

An Image/Link below is provided (as is) to

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime. While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

1. Lecture 3: Introduction to Classification CS 175, Fall 2007 Padhraic Smyth Department of Computer Science University of California, Irvine

2. Outline Overview of Classification: examples and applications of classification classification: mapping from features to a class label decision boundaries training and test data accuracy the nearest-neighbor classifier Assignments: Assignment 2 due Wednesday next week plotting classification data k-nearest-neighbor classifiers

3. Classification Classification is an important component of intelligent systems We have a special discrete-valued variable called the class, C C takes values c, where c = 1, c = 2, ?., c = m for now assume m=2, i.e., 2 classes: c = 1 or c = 2 Problem is to decide what class an object is i.e., what value the class variable C is for a given object given measurements on the object, e.g., A, B, ?. These measurements are called ?features? we wish to learn a mapping from Features -> Class Notation: C is the class A, B, etc (the measurements) are called the ?features? (sometimes also called ?attributes? or ?input variables?)

4. Classification Functions

5. Applications of Classification Medical Diagnosis classification of cancerous cells Credit card and Loan approval Most major banks Speech recognition IBM, Dragon Systems, AT&T, Microsoft, etc Optical Character/Handwriting Recognition Post Offices, Banks, Gateway, Motorola, Microsoft, Xerox, etc Email classification classify email as ?junk? or ?non-junk? Many other applications one of the most successful applications of AI technology

6. Examples of Features and Classes

7. Examples of Features and Classes

8. Examples of Features and Classes

9. Classification of Galaxies

12. Feature Vectors and Feature Spaces Feature Vector: Say we have 2 features: we can think of the features as a 2-component vector i.e., a 2-dimensional vector, [a b] So the features correspond to a 2-dimensional space (clearly we can generalize to d-dimensional space) this is called the ?feature space? Each feature vector represents the ?coordinates? of a particular object in feature space If the feature-space is 2-dimensional (for example), and the features a and b are real-valued we can visually examine and plot the locations of the feature vectors

13. Data with 2 Features

14. Data from Multiple Classes Now consider that we have data from m classes (e.g., m=2) We can imagine the data from each class being in a ?cloud? in feature space data sets D1 and D2 (sets of points from classes 1 and 2) data are of dimension d (i.e., d-dimensional vectors) if d = 2 (2 features), we can plot the data we should see two ?clouds? of data points, one cloud per class

15. Example of Data from 2 Classes

17. Decision Boundaries What is a Classifier? A classifier is a mapping from feature space (a d-dimensional vector) to the class labels {1, 2, ? m} Thus, a classifier partitions the feature space into m decision regions The line or surface separating any 2 classes is the decision boundary Linear Classifiers a linear classifier is a mapping which partitions feature space using a linear function (a straight line, or a hyperplane) it is one of the simplest classifiers we can imagine in 2 dimensions the decision boundary is a straight line

18. 2-Class Data with a Linear Decision Boundary

19. Class Overlap Consider two class case data from D1 and D2 may overlap features = {age, body temperature}, classes = {flu, not-flu} features = {income, savings}, classes = {good/bad risk} common in practice that the classes will naturally overlap this means that our features are usually not able to perfectly discriminate between the classes note: with more expensive/more detailed additional features (e.g., a specific test for the flu) we might be able to get perfect separation if there is overlap => classes are not linearly separable

20. Classification Problem with Overlap

24. Classification Accuracy Say we have N feature vectors Say we know the true class label for each feature vector We can measure how accurate a classifier is by how many feature vectors it classifies correctly Accuracy = percentage of feature vectors correctly classified training accuracy = accuracy on training data test accuracy = accuracy on new data not used in training

25. Training Data and Test Data Training data labeled data used to build a classifier Test data new data, not used in the training process, to evaluate how well a classifier does on new data Memorization versus Generalization better training_accuracy ?memorizing? the training data: better test_accuracy ?generalizing? to new data in general, we would like our classifier to perform well on new test data, not just on training data, i.e., we would like it to generalize well to new data Test accuracy is more important than training accuracy

26. Examples of Training and Test Data Speech Recognition Training data words recorded and labeled in a laboratory Test data words recorded from new speakers, new locations Zipcode Recognition Training data zipcodes manually selected, scanned, labeled Test data actual letters being scanned in a post office Credit Scoring Training data historical database of loan applications with payment history or decision at that time Test data you

27. Some Notation Training Data Dtrain = { [x(1), c(1)] , [x(2), c(2)] , ????[x(N), c(N)] } N pairs of feature vectors and class labels Feature Vectors and Class Labels: x(i) is the ith training data feature vector in MATLAB this could be the ith row of an N x d matrix c(i) is the class label of the ith feature vector in general, c(i) can take m different class values, e.g., c = 1, c = 2, ... Let y be a new feature vector whose class label we do not know, i.e., we wish to classify it.

28. Nearest Neighbor Classifier y is a new feature vector whose class label is unknown Search Dtrain for the closest feature vector to y let this ?closest feature vector? be x(j) Classify y with the same label as x(j), i.e. y is assigned label c(j) How are ?closest x? vectors determined? typically use minimum Euclidean distance dE(x, y) = sqrt(S (xi - yi)2 ) Side note: this produces a ?Voronoi tesselation? of the d-space each point ?claims? a cell surrounding it cell boundaries are polygons Analogous to ?memory-based? reasoning in humans

29. Geometric Interpretation of Nearest Neighbor

30. Regions for Nearest Neighbors

31. Nearest Neighbor Decision Boundary

32. How should the new point be classified?

33. Local Decision Boundaries

34. Finding the Decision Boundaries

35. Finding the Decision Boundaries

36. Finding the Decision Boundaries

37. Overall Boundary = Piecewise Linear

38. Geometric Interpretation of kNN (k=1)

39. More Data Points

40. More Complex Decision Boundary

41. K-Nearest Neighbor (kNN) Classifier Find the k-nearest neighbors to y in Dtrain i.e., rank the feature vectors according to Euclidean distance select the k vectors which are have smallest distance to y Classification ranking yields k feature vectors and a set of k class labels pick the class label which is most common in this set (?vote?) classify y as belonging to this class Theoretical Considerations as k increases we are averaging over more neighbors the effective decision boundary is more ?smooth? as N increases, the optimal k value tends to increase in proportion to log N

42. K-Nearest Neighbor (kNN) Classifier Notes: In effect, the classifier uses the nearest k feature vectors from Dtrain to ?vote? on the class label for y the single-nearest neighbor classifier is the special case of k=1 for two-class problems, if we choose k to be odd (i.e., k=1, 3, 5,?) then there will never be any ?ties? ?training? is trivial for the kNN classifier, i.e., we just use Dtrain as a ?lookup table? when we want to classify a new feature vector Extensions of the Nearest Neighbor classifier weighted distances e.g., if some of the features are more important e.g., if features are irrelevant fast search techniques (indexing) to find k-nearest neighbors in d-space

43. Accuracy on Training Data

44. Accuracy on Test Data

45. Assignment 2 Due Wednesday?.. 4 parts Plot classification data in two-dimensions Implement a nearest-neighbor classifier Plot the errors of a k-nearest-neighbor classifier Test the effect of the value k on the accuracy of the classifier

46. Data Structure simdata1 = shortname: 'Simulated Data 1' numfeatures: 2 classnames: [2x6 char] numclasses: 2 description: [1x66 char] features: [200x2 double] classlabels: [200x1 double]

47. Plotting Function

48. First simulated data set, simdata1

49. Second simulated data set, simdata2

50. Nearest Neighbor Classifier

51. Plotting k-NN Errors

52. Accuracy of kNN Classifier as k is varied

53. Test Accuracy and Generalization The accuracy of our classifier on new unseen data is a fair/honest assessment of the performance of our classifier Why is training accuracy not good enough? Training accuracy is optimistic a classifier like nearest-neighbor can construct boundaries which always separate all training data points, but which do not separate new points e.g., what is the training accuracy of kNN, k = 1? A flexible classifier can ?overfit? the training data in effect it just memorizes the training data, but does not learn the general relationship between x and C Generalization We are really interested in how our classifier generalizes to new data test data accuracy is a good estimate of generalization performance

54. Another Example

55. A More Complex Decision Boundary

56. Example: The Overfitting Phenomenon

57. A Complex Model

58. The True (simpler) Model

59. How Overfitting affects Prediction

60. How Overfitting affects Prediction

61. How Overfitting affects Prediction

62. Summary Important Concepts classification is an important component in intelligent systems a classifier = a mapping from feature space to a class label decision boundaries = boundaries between classes classification learning using training data to define a classifier the nearest-neighbor classifier training accuracy versus test accuracy

Other Related Presentations

Copyright © 2014 SlideServe. All rights reserved | Powered By DigitalOfficePro