Naive Bayes Classifiers, an Overview

Naive Bayes Classifiers,an Overview By Roozmehr Safi

What is Naive Bayes Classifier (NBC)? • NBC is a probabilistic classification method. • Classification (A.K.A. discrimination, or supervised learning) is assigning new cases to one of a the pre-defined classes given a sample of cases for which the true classes are known. • NBC is one of the oldest and simplest classification methods.

Some NBC Applications • Credit scoring • Marketing applications • Employee selection • Image processing • Speech recognition • Search engines…

How does NBC Work? • NBC applies Bayes’ theorem with (naive) independence assumptions. • A more descriptive term for it would be "independent feature model".

How does NBC work, Cntd. • Let X1,…, Xm denote our features (Height, weight, foot size…), Y is the class number (1 for men,2 for women), and C is the number of classes (2). The problem consists of classifying the case (x1,…, xm) to the class c maximizing P(Y=c| X1=x1,…, Xm=xm) over c=1,…, C. Applying Bayes’ rule gives: P(Y=c| X1=x1,…, Xm=xm) = P(X1=x1,…, Xm=xm | Y=c)P(Y=c) / P(X1=x1,…, Xm=xm) . • Under the NB’s assumption of conditional independence, P(X1=x1,…, Xm=xm | Y=c) is replaced by • And NB reduces the original problem to: .

NBC advantage • Despite unrealistic assumption of independence, NBC is remarkably successful even when independence is violated. • Due to its simple structure the NBC it is appealing when the set of variables is large. • NBC requires a small amount of training data: • It only needs to estimate means and variances of the variables • No need to form the covariance matrix. • Computationally inexpensive.

A Demonstration Data: From an online B2B exchange (1220 cases). Purpose: To distinguish cheaters of good sellers. Predictors: • Member type: Enterprise, personal, other • Years since joined: 1 to 10 years. • No. of months since last membership renewal • Membership Renewal duration. • Type of service bought: standard, limited edition… • If the member has a registered company. • If the company page is decorated. • Number of days in which member logged in during past 60 days. • Industry: production, distribution, investment… Target: to predict if a seller is likely to cheat buyers based on data from old sellers.

Issues involved: Prob. distribution • With discrete (categorical) features, estimating the probabilities can be done using frequency counts. • With continuous features one can assume a certain form of quantitative probability distribution. • There is evidence that discretization of data before applying NB is effective. • Equal Frequency Discretization (EFD) divides the sorted values of a continuous variable into k equally populated bins.

Issues involved: Zero probabilities • The case when a class and a feature value never occur together in the training set creates a problem, because assigning a probability of zero to one of the terms causes the whole expression to evaluate to zero. • The zero probability can be replaced by a small constant, such as 0.5/n where n is the number of observations in the training set.

Issues involved: Missing values • In some applications, values are missing not at random and can be meaningful. Therefore, missing values are treated as a separate category. • If one does not want to treat missing values as a separate category, they should be handled prior to applying this macro with either a missing value imputation or excluding cases where they are present.

Thank you

Naive Bayes Classifiers, an Overview