1 / 13

Bayesian classifiers

Bayesian classifiers. Bayesian Classification: Why?. Probabilistic learning : Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems

gavivi
Download Presentation

Bayesian classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian classifiers

  2. Bayesian Classification: Why? • Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems • Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data. • Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities • Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

  3. Bayesian Theorem • Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes theorem • MAP (maximum posteriori) hypothesis • Practical difficulty: require initial knowledge of many probabilities, significant computational cost

  4. Naïve Bayesian Classification • If i-th attribute is categorical:P(di|C) is estimated as the relative freq of samples having value di as i-th attribute in class C • If i-th attribute is continuous:P(di|C) is estimated thru a Gaussian density function • Computationally easy in both cases

  5. outlook P(sunny|p) = 2/9 P(sunny|n) = 3/5 P(overcast|p) = 4/9 P(overcast|n) = 0 P(rain|p) = 3/9 P(rain|n) = 2/5 temperature P(hot|p) = 2/9 P(hot|n) = 2/5 P(mild|p) = 4/9 P(mild|n) = 2/5 P(cool|p) = 3/9 P(cool|n) = 1/5 humidity P(p) = 9/14 P(high|p) = 3/9 P(high|n) = 4/5 P(n) = 5/14 P(normal|p) = 6/9 P(normal|n) = 2/5 windy P(true|p) = 3/9 P(true|n) = 3/5 P(false|p) = 6/9 P(false|n) = 2/5 Play-tennis example: estimating P(xi|C)

  6. Play-tennis example: classifying X • An unseen sample X = <rain, hot, high, false> • P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582 • P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286 • Sample X is classified in class n (don’t play)

  7. The independence hypothesis… • … makes computation possible • … yields optimal classifiers when satisfied • … but is seldom satisfied in practice, as attributes (variables) are often correlated. • Attempts to overcome this limitation: • Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes

  8. Bayesian Belief Networks (I) Age FamilyH (FH, A) (FH, ~A) (~FH, A) (~FH, ~A) M 0.7 0.8 0.5 0.1 Diabetes Mass ~M 0.3 0.2 0.5 0.9 The conditional probability table for the variable Mass Insulin Glucose Bayesian Belief Networks

  9. Applying Bayesian nets • When all but one variable known: • P(D|A,F,M,G,I) From Jiawei Han's slides

  10. Bayesian belief network • Find joint probability over set of variables making use of conditional independence whenever known a d ad ad adad b b 0.1 0.2 0.3 0.4 Variable e independent of d given b b 0.3 0.2 0.1 0.5 e C From Jiawei Han's slides

  11. Bayesian Belief Networks (II) • Bayesian belief network allows a subset of the variables conditionally independent • A graphical model of causal relationships • Several cases of learning Bayesian belief networks • Given both network structure and all the variables: easy • Given network structure but only some variables: use gradient descent / EM algorithms • When the network structure is not known in advance • Learning structure of network harder

  12. The k-Nearest Neighbor Algorithm • All instances correspond to points in the n-D space. • The nearest neighbor are defined in terms of Euclidean distance. • The target function could be discrete- or real- valued. • For discrete-valued, the k-NN returns the most common value among the k training examples nearest toxq. • Vonoroi diagram: the decision surface induced by 1-NN for a typical set of training examples. . _ _ _ . _ . + . + . _ + xq . _ From Jiawei Han's slides +

  13. Discussion on the k-NN Algorithm • The k-NN algorithm for continuous-valued target functions • Calculate the mean values of the k nearest neighbors • Distance-weighted nearest neighbor algorithm • Weight the contribution of each of the k neighbors according to their distance to the query point xq • giving greater weight to closer neighbors • Similarly, for real-valued target functions • Robust to noisy data by averaging k-nearest neighbors • Curse of dimensionality: distance between neighbors could be dominated by irrelevant attributes. • To overcome it, axes stretch or elimination of the least relevant attributes. From Jiawei Han's slides

More Related