1 / 37

Statistical Approach to Classification

Statistical Approach to Classification. Naïve Bayes Classifier. Remember…. Sensors, scales, etc…. Red = 2.125 Yellow = 6.143 Mass = 134.32 Volume = 24.21. Apple. Redness. Let’s look at one dimension. For a given redness value which is the most probable fruit. Redness.

cutler
Download Presentation

Statistical Approach to Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Approach to Classification Naïve Bayes Classifier

  2. Remember… Sensors, scales, etc… Red = 2.125Yellow = 6.143Mass = 134.32Volume = 24.21 Apple Bayesian Classifier

  3. Redness • Let’s look at one dimension For a given redness value which is the most probable fruit Bayesian Classifier

  4. Redness • What if we wanted to ask the question “what is the probability that some fruit with a given redness value is an apple?” Could we just look at how far away it is from the apple peak? Is it the highest PDF above the X-value in question? Bayesian Classifier

  5. Probability it’s an apple • If a fruit has a redness of 4.05 do we know the probability that it’s an apple? • What do we know? • We know the total number of fruit at that redness (10+25) • We know the fraction of apples at that redness (10) • Probability that a fruit with a redness value of 4.05 is an apple is • If it is a histogram of counts then it straight forward • Probability it’s an apple 28.57% • Probability it’s an orange 71.43% • Getting the probability is simple Bayesian Classifier

  6. But what if working PDF • Probability density function • Continuous • Probability not count • Might be temptedto use the same approach Parametric ( and  parameters) vs. non-parametric P(a fruit with redness 4.05 is apple)?= Bayesian Classifier

  7. Problem Wouldn’t change the PDFsbut… • What if had trillion oranges and only 100 apples • Might be the most common apple and have a higher value at 4.05 than oranges even though the universe would have way more oranges at that value Bayesian Classifier

  8. Let’s revisit but using probabilities instead of counts • 2506 apples • 2486 oranges • If a fruit has a redness of 4.05 do we know the probability that it’s an apple if we don’t have specific counts at 4.05? Conditional Probability If we know it is an apple, then the… But what we want Bayesian Classifier

  9. Bayes Theorem • Above from the book • h is hypothesis, D is training Data Does this make sense? Bayesian Classifier

  10. Make Sense? • 2506 apples • 2486 oranges • Probability that redness would be 4.05 if know an apple • About 10/2506 • P(apple)? • 2506/(2506+2486) • P(redness=4.05) • About (10+25)/(2506+2486) ? Bayesian Classifier

  11. Can find the probability • Whether have counts or PDF • How do we classify? • Simply find the most probable class Bayesian Classifier

  12. Bayes • I think of the ratio of P(h) to P(D) as an adjustment to the easily determined P(D|h) in order to account for differences in sample size Posterior Probability Prior Probabilities or Priors Bayesian Classifier

  13. MAP • Maximum a posteriori hypothesis (MAP) • ä-(ˌ)pō-ˌstir-ē-ˈȯr-ē • Relating to or derived by reasoning from observed facts; inductive • A priori: relating to or derived by reasoning from self-evident propositions; deductive • Approach: Brute-force MAP learning algorithm Bayesian Classifier

  14. More is better More dimensions can be helpful Red Intensity (normalized) 1 2 3 4 5 6 7 8 910 Linearly Separable 01 2 3 4 5 6 7 8 910 Mass (normalized) Bayesian Classifier

  15. What if some of the dims disagree • Color (red and yellow) says apple but mass and volume say orange? • Take a vote? How handle multiple dimensions? Bayesian Classifier

  16. Can cheat • Assume each dimension is independent (doesn’t co-vary with any other dimension) • Can use the product rule • The probability that a fruit is an apple given a set of measurements (dimensions) is: Bayesian Classifier

  17. Naïve Bayes Classifier • Known as a Naïve Bayes Classifier • Where vj is class and ai is an attribute • Derivation Where is the denominator? Bayesian Classifier

  18. Example You wish to classify an instance with the following attributes 1.649917 5.197862 134.898820 16.137695 The first column is redness, then yellowness, followed by mass then volume The training data has in the redness histogram bin in which the instance falls 0 apples, 0 peaches, 9 oranges, and 22 lemons In the bin for yellowness there are 235, 262, 263, and 239 In the bin for mass there are 106, 176, 143, and 239 In the bin for vol there are What 3, 57, 7, and 184 • What are each of the probabilities that it is an • Apple • Peach • Orange • Lemon Bayesian Classifier

  19. Solution Bayesian Classifier

  20. Zeros Is it really a zero percent chance that it’s an apple? Are these really probabilities (hint: 0.0005 + 0.0044 not equal to 1)? What of the bin size? Bayesian Classifier

  21. Zeros Estimating probabilities is an estimate of the probability m-estimate The choice of m is often some upper bound to n and p is often 1/m This ensures a numerator is at least 1 (never zero) Denominator starts at upper bound and goes up to twice that No loss of order, would be zeros are very small Bayesian Classifier

  22. Curse of dimensionality Do too many dimensions hurt? What if only some dimensions contribute to ability to classify? What would the other dimensions do to the probabilities? Bayesian Classifier

  23. All about representation With imagination and innovation can learn to classify many things you wouldn’t expect What if you wanted to learn to classify documents, how might you go about it? Bayesian Classifier

  24. Example Learning to classify text Collect all words in examples Calculate P(vj) and P(wk|vj) Each instance will be a vector of size |vocabulary| Classes (v’s) (category) Each word (w) is a dimension Bayesian Classifier

  25. Paper 20 News groups 1000 training documents from each group The groups were the classes 89% classification accuracy 89 out of every 100 times could tell which newsgroup a document came from Bayesian Classifier

  26. Another example: RNA Rift Valley fever virus Basically RNA (like DNA but with an extra oxygen – the D in DNA is deoxy) Encapsulated in a protein sheath Important protein involved in the encapsulation process Nucleocapsid Bayesian Classifier

  27. SELEX SELEX (Systematic Evolution of Ligands by Exponential Enrichment) Identify RNA segments that have a high affinity for nucleocapsid (aptamer vs. non-aptamer) Bayesian Classifier

  28. Could we build a classifier Each known aptamer was 30 nucleotides long A 30 character string 4 nucleotides (ACGU) What would the data look-like How would we “bin” the data? Bayesian Classifier

  29. Discrete or real valued? Have seen Fruit example Documents RNA (nucleotides) Which is best for Bayesian? Integers Strings Floating Point Bayesian Classifier

  30. Results Bayesian Classifier

  31. Gene Expression Experiments • The brighter the spot, the greater the mRNA concentration Bayesian Classifier

  32. Can we use expression profiles to detect disease Thousands of genes (dimensions) Many genes not affected (distributions for disease and normal same in that dimension) Bayesian Classifier

  33. RareMoss Growth Conditions Perhaps at good growth locations pH Average temperature Average sunlight exposure Salinity Average length of day What else? What would the data look-like? Bayesian Classifier

  34. Proof Taken from “Pattern Recognition” third edition SergiosTheodoridis and KonstantinosKoutroumbas The Bayesian classifier is optimal with respect to minimizing the classification error probability Proof: let R1be the region of the feature space in which we decide tin favor of w1and R2 be the corresponding region for w2. Then an error is made if although it belongs to w2 of if although it belongs to w1. Bayesian Classifier

  35. Proof Joint probability Using Bayes Rule It is now easy to see that the error is minimized if the partitioning regions R1and R2 of the feature space are chosen so that: Bayesian Classifier

  36. Proof Indeed, since the union of the regions R1, R2 covers all the space, from the definition of a probability density function we have that Combining This suggests that the probability of error is minimized if R1is the region of space in which. Then R2 becomes the region where the reverse is true. Bayesian Classifier

  37. Bayesian Classifier

More Related