1 / 46

Classification I

Classification I. Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn. Overview. K-Nearest Neighbor Algorithm Naïve Bayes Classifier . Thomas Bayes. Classification. Definition. Classification is one of the fundamental skills for survival. Food vs. Predator

kioshi
Download Presentation

Classification I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

  2. Overview • K-Nearest Neighbor Algorithm • Naïve Bayes Classifier Thomas Bayes

  3. Classification

  4. Definition • Classification is one of the fundamental skills for survival. • Food vs. Predator • A kind of supervised learning • Techniques for deducing a function from data • <Input, Output> • Input: a vector of features • Output: a Boolean value (binary classification) or integer (multiclass) • “Supervised” means: • A teacher or oracle is needed to label each data sample. • We will talk about unsupervised learning later.

  5. Classifiers {boy, girl} Peter Weight Sam Jack Z=f(x,y) Jane Tom Lisa Helen Mary Height Weight Height

  6. Training a Classifier Learning

  7. Lazy Learners Truck Car

  8. Neighborhood

  9. K-Nearest Neighbor Algorithm • The algorithm procedure: • Given a set of n training data in the form of <x, y>. • Given an unknown sample x′. • Calculate the distance d(x′, xi) for i=1 … n. • Select the K samples with the shortest distances. • Assign x′ the label that dominates the K samples. • It is the simplest classifier you will ever meet (I mean it!). • No Training (literally) • A memory of the training data is maintained. • All computation is deferred until classification. • Produces satisfactory results in many cases. • Should give it a go whenever possible.

  10. Properties of KNN Instance-Based Learning No explicit description of the target function Can handle complicated situations.

  11. Properties of KNN K=7 Neighborhood ? K=1 Neighborhood Dependent of the data distributions. Can make mistakes at boundaries.

  12. Challenges of KNN • The Value of K • Non-monotonous impact on accuracy • Too Big vs. Too Small • Rule of thumbs • Weights • Different features may have different impact … • Distance • There are many different ways to measure the distance. • Euclidean, Manhattan … • Complexity • Need to calculate the distance between x′ and all training data. • In proportion to the size of the training data. Accuracy K

  13. Distance Metrics

  14. Distance Metrics The shortest path between two points …

  15. Mahalanobis Distance Distance from a point to a point set

  16. Mahalanobis Distance For identity matrix S: For diagonal matrix S:

  17. Voronoi Diagram

  18. Voronoi Diagram

  19. Structured Data 1 ? 0.5 0 0.5 1

  20. KD-Tree Point Set: {(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)}

  21. KD-Tree • functionkdtree (list of pointspointList, intdepth) • { • ifpointListis empty • returnnil; • else • { • // Select axis based on depth so that axis cycles through all valid values • varintaxis := depth modk; • // Sort point list and choose median as pivot element • selectmedian byaxis frompointList; • // Create node and construct subtrees • vartree_nodenode; • node.location := median; • node.leftChild := kdtree(points inpointListbeforemedian, depth+1); • node.rightChild := kdtree(points inpointListaftermedian, depth+1); • returnnode; • } • }

  22. KD-Tree

  23. Evaluation • Accuracy • Recall what we have learned in the first lecture … • Confusion Matrix • ROC Curve • Training Set vs. Test Set • N-fold Cross Validation

  24. LOOCV • Leave One Out Cross Validation • An extreme case of N-fold cross validation • N=number of available samples • Usually very time consuming but okay for KNN • Now, let’s try KNN+LOOCV … • All students in this class are given one of two labels. • Gender: Male vs. Female • Major: CS vs. EE vs. Automation

  25. 10 Minutes …

  26. Bayes Theorem A B Bayes Theorem

  27. Fish Example • Salmon vs. Tuna • P(ω1)=P(ω2) • P(ω1)>P(ω2) • Additional information

  28. Shooting Example • Probability of Kill • P(A): 0.6 • P(B): 0.5 • The target is killed with: • One shoot from A • One shoot from B • What is the probability that it is shot down by A? • C: The target is killed.

  29. Cancel Example • ω1: Cancer; ω2: Normal • P(ω1)=0.008; P(ω2)=0.992 • Lab Test Outcomes: + vs. – • P(+|ω1)=0.98; P(-|ω1)=0.02 • P(+|ω2)=0.03; P(-|ω2)=0.97 • Now someone has a positive test result… • Is he/she doomed?

  30. Cancel Example

  31. Headache & Flu Example • H=“Having a headache” • F=“Coming down with flu” • P(H)=1/10; P(F)=1/40; P(H|F)=1/2 • What does this mean? • One day you wake up with a headache … • Since 50% of flus are associated with headaches … • I must have a 50-50 chance of coming down with flu!

  32. Headache and Flu Example The truth is … Flu Headache

  33. Naïve Bayes Classifier Conditionally Independent MAP: Maximum APosterior

  34. Independence Conditionally Independent

  35. Conditional Independence

  36. Independent ≠ Uncorrelated Cov (X,Y)=0  X and Y are uncorrelated However, Y is completely determined by X.

  37. Estimating P(αj|ωi) Laplace Smoothing How about continuous variables?

  38. Tennis Example

  39. Tennis Example

  40. Text Classification Example Interesting? Boring? Politics? Entertainment? Sports?

  41. Text Representation We need to estimate probabilities such as . However, there are 2×n×|Vocabulary| terms in total. For n=100 and a vocabulary of 50,000 distinct words, it adds up to 10 million terms!

  42. Text Representation • By only considering the probability of encountering a specific word instead of the specific word position, we can reduce the number of probabilities to be estimated. • We only count the frequency of each word. • Now, 2×50,000=100,000 terms need to be estimated. • n: the total number of word positions in all training samples whose target value is ωi. • nk: the number of times word Vk is found among these n positions.

  43. Case Study: Newsgroups • Classification • Joachims, 1996 • 20 newsgroups • 20,000 documents • Random Guess: 5% • NB: 89% • Recommendation • Lang, 1995 • NewsWeeder • User rated articles • Interesting vs. Uninteresting • Top 10% selected articles • 16% vs. 59%

  44. Reading Materials • C. C. Aggarwal, A. Hinneburg and D. A. Keim, “On the Surprising Behavior of Distance Metrics in High Dimensional Space,” Proc. the 8th International Conference on Database Theory, LNCS 1973, pp. 420-434, London, UK, 2001. • J. H. Friedman, J. L. Bentley, and R. A. Finkel, “An Algorithm for Finding Best Matches in Logarithmic Expected Time,” ACM Transactions on Mathematical Software, 3(3):209–226, 1977. • S. M. Omohundro, “Bumptrees for Efficient Function, Constraint, and Classification Learning,” Advances in Neural Information Processing Systems 3, pp. 693-699, Morgan Kaufmann, 1991. • Tom Mitchell, Machine Learning (Chapter 6), McGraw-Hill. • Additional reading about Naïve Bayes Classifier • http://www-2.cs.cmu.edu/~tom/NewChapters.html • Software for text classification using Naïve Bayes Classifier • http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

  45. Review • What is classification? • What is supervised learning? • What does KNN stand for? • What are the major challenges of KNN? • How to accelerate KNN? • What is N-fold cross validation? • What does LOOCV stand for? • What is Bayes Theorem? • What is the key assumption in Naïve Bayes Classifiers?

  46. Next Week’s Class Talk • Volunteers are required for next week’s class talk. • Topic 1: Efficient KNN Implementations • Hints: • Ball Trees • Metric Trees • R Trees • Topic 2: Bayesian Belief Networks • Length: 20 minutes plus question time

More Related