1 / 15

Bayesian Classifier

Bayesian Classifier. YES. YES. YES. NO. NO. Review: Decision Tree. <Jeff, age:  30, income: medium, student: yes, credit: fair>. Age?. buys_computer ?. 31…40. <=30. >40. Credit?. Student?. excellent. no. yes. fair. Bayesian Classification.

daryl-lane
Download Presentation

Bayesian Classifier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Classifier

  2. YES YES YES NO NO Review: Decision Tree <Jeff, age: 30, income: medium, student: yes, credit: fair> Age? buys_computer ? 31…40 <=30 >40 Credit? Student? excellent no yes fair

  3. Bayesian Classification • Bayesian classifier vs. decision tree • Decision tree: predict the class label • Bayesian classifier: statistical classifier;predict class membership probabilities • Based on Bayes theorem; estimate posterior probability • Naïve Bayesian classifier: • Simple classifier that assumes attribute independence • High speed when applied to large databases • Comparable in performance to decision trees

  4. Bayes Theorem • Let X be a data sample whose class label is unknown • Let Hi be the hypothesis that X belongs to a particular class Ci • P(Hi) is class prior probability that X belongs to a particular class Ci • Can be estimated by ni/n from training data samples • n is the total number of training data samples • ni is the number of training data samples of class Ci Formula of Bayes Theorem

  5. H1: Buys_computer=yes • H0: Buys_computer=no • P(H1)=6/10 = 0.6 • P(H0)=4/10 = 0.4

  6. Bayes Theorem • P(Hi|X) is class posteriori probability (of H conditioned on X) • Probability that data example X belongs to class Cigiven the attribute values of X • e.g., given X=(age:31…40, income: medium, student: yes, credit: fair), what is the probability X buys computer? • To classify means to determine the highest P(Hi|X) among all classes C1,…Cm • If P(H1|X)>P(H0|X), then X buys computer • If P(H0|X)>P(H1|X), then X does not buy computer • Calculate P(Hi|X) using the Bayes theorem

  7. Bayes Theorem • P(X) is descriptor prior probability of X • Probability that observe the attribute values of X • Suppose X= (x1, x2,…, xn) and they are independent, then P(X) =P(x1) P(x2) … P(xn) • P(xj)=nj/n, where • nj is number of training examples having value xj for attribute Aj • n is the total number of training examples • Constant for all classes

  8. X=(age:31…40, income: medium, student: yes, credit: fair) • P(age=31…40)=3/10 P(income=medium)=3/10 P(student=yes)=5/10 P(credit=fair)=7/10 • P(X)=P(age=31…40)  P(income=medium)  P(student=yes)  P(credit=fair) =0.3  0.3  0.5  0.7 = 0.0315

  9. Bayes Theorem • P(X|Hi) is descriptor posterior probability • Probability that observe X in class Ci • Assume X=(x1, x2,…, xn) and they are independent, then P(X|Hi) =P(x1|Hi) P(x2|Hi) … P(xn|Hi) • P(xj|Hi)=ni,j/ni, where • ni,j is number of training examples in class Ci having value xj for attribute Aj • ni is number of training examples in Ci

  10. X= (age:31…40, income: medium, student: yes, credit: fair) • H1 = X buys a computer • n1 = 6 , n11=2, n21=2, n31=4, n41=5, • P(X|H1)=

  11. X= (age:31…40, income: medium, student: yes, credit: fair) • H0 = X does not buy a computer • n0 = 4 , n10=1, n20=1, n31=1, n41 = 2, • P(X|H0)=

  12. Bayesian Classifier – Basic Equation Class Prior Probability Descriptor Posterior Probability Descriptor Prior Probability Class Posterior Probability To classify means to determine the highest P(Hi|X) among all classes C1,…Cm P(X) is constant to all classes Only need to compare P(Hi)P(X|Hi)

  13. Weather dataset example X =< rain, hot, high, false>

  14. Weather dataset example: classifying X • An unseen sample X = <rain, hot, high, false> • P(p) P(X|p) = P(p) P(rain|p) P(hot|p) P(high|p) P(false|p) = 9/14 · 3/9 · 2/9 · 3/9 · 6/9· = 0.010582 • P(n) P(X|n) = P(n) P(rain|n) P(hot|n) P(high|n) P(false|n) = 5/14 · 2/5 · 2/5 · 4/5 · 2/5 = 0.018286 • Sample X is classified in class n (don’t play)

  15. The independence hypothesis… • … makes computation possible • … yields optimal classifiers when satisfied • … but is seldom satisfied in practice, as attributes (variables) are often correlated. • Attempts to overcome this limitation: • Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes • Decision trees, that reason on one attribute at the time, considering most important attributes first

More Related