1 / 19

Naive Bayes model

Naive Bayes model. Comp221 tutorial 4 (assignment 1) TA: Zhang Kai. Outline. Bayes probability model Naive Bayes classifier Text classification Digit classification Assignment specifications. Naive Bayes classifier.

yoko
Download Presentation

Naive Bayes model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Naive Bayes model Comp221 tutorial 4 (assignment 1) TA: Zhang Kai

  2. Outline • Bayes probability model • Naive Bayes classifier • Text classification • Digit classification • Assignment specifications

  3. Naive Bayes classifier • A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions, or more specifically, independent feature model.

  4. Naive Bayes probability model • Graphical illustration • a class node C at root, want P(C|F1,…,Fn) • evidence nodes F - observed features as leaves • conditional independence between all evidence C …… F1 F2 Fn

  5. Naive Bayes probability model • The classifier is a conditional model • Following the Bayes’s rule strictly, we have ….. • Simplify this through conditional independence - • So the conditional distribution over the class C is Z is constant given features

  6. Naive Bayes classifier • The naive Bayes classifier combines naive Bayes probability model with a decision rule, such as the maximum a posteriori or MAP decision rule. • If there are k classes and if a model for p(Fi) can be expressed by r parameters, then the naive Bayes model has (k − 1) + nrk parameters.

  7. Text Classification • Task- classify text documents into one of the pre-defined classes such as sports, recreation, politics, war, economy,…,etc, • Given • K groups of training texts • Each group with a label, containing a number of text documents

  8. Procedures • Computing a priori class probabilities • Count the number of text documents in each directory/class ni • Total number of training text documents n • Prior probability P(Ci) = ni / n

  9. Computing class conditional word likelihoods • Suppose we have chosen m key words, denoted as w1, w2,…,wm • Count the number of times – cji, that word wj occurs in text class Ci • Count the number of words – ni, in class Ci • Class conditional probaiblity is P(wj | Ci) = cji / ni

  10. Classifying a new message d • Compute the features of d, i.e., the number of times word wj occuring in d • P(Ci|d) = P(Ci|w1,w2,…,wd) œ P(Ci)P(w1|Ci) (w2|Ci)… (wd|Ci) • Assign d to the class I that has the maximum posterior probability

  11. Attentions • Preprocessing • eliminating punctuation • eliminating numerals • converting all characters to lowercase • eliminating all words with less than 4 letters

  12. You need to build a large vocabulary and separately counts how often a given word was encountered. The vocabulary can be built using a hash table. • How to choose the key words wi’s? • For each class, you can pick out the first k words that occurs most frequently • For all the training data, pick out the first k works that appears most frequently • Union all these words as key-words/features

  13. Zero probabilities must be avoided (why?) • This occurs when one word has been encountered only in one class, but not others. • In this case the class conditional probability is zero • To prevent this, re-estimate the conditional prob as P(wj|Ci) =ε/ni with ni a small, tunable number • Convert all probabilities to logprobabilities (loglikeli-hoods) to avoid exceeding the dynamic range of the computer representation of real numbers

  14. Digit Classification (assignment 1) • USPS data set contains normalized handwritten digits, scanned by the U.S. Postal Service. • 16 x 16 grayscale images • 7291 training and 2007 test observations • Format: each line consists of the digit id (0-9) followed by the 256 grayscale values. • The test set is notoriously "difficult“ • Download it from here

  15. USPS digits

  16. Setting • Classes: 0~9 • Features: each pixel is used as a feature, so there are 16 by 16, i.e., 256 features • Rather than pixel gray values, we can use more informative features, such as (detected) corners, crosses, slope, gravity center, etc. • How to quantize the real valued features. • Task: classify new digits into one of the classes

  17. Specifications • (preliminary, assignment 1 will come soon on Friday) • You can either use matlab or c++ for programming • If you use c++, you should have created the class and its members/functions as required • If you use matlab, you should have written functions as required • Input and output format will also be fixed in the assignemnt

  18. Files • Matlab file to read the USPS data • >[n, digit, label] = read_usps(path, file); • path = ‘c:\...’; file = ‘usps_train.txt’; • n: number of digits/images obtained • Digit: a 16 by 16 by n matrix; • Label: label of each image; • You may want to use it to read the USPS data

  19. Matlab file to output a series of files • >output(str,i1,i2); • str: common string part ; • i1 and i2 is the starting and ending integers • You may want to use it to write the digits into separate files with the naming system you like

More Related