Ch5 Stochastic Methods

Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Outline • Introduction • Intro to Probability • Baye’s Theory • Naïve Baye’s Theory • Application’s of the Stochastic Methods

Chapter 4 introduced heuristic search as an approach to problem solving in domains where A problem does not have an exact solution or The full state space may be to costly to calculate Introduction

Introduction • Important application domains for the use of the stochastic method are • diagnostic reasoning where cause/effect relationships are not always captured in a purely deterministic fashion • Gambling

Elements of Probability Theory • Elementary Event • An elementary or atomic event is a happing or occurrence that cannot be made up of other events • Event E • An event is a set of elementary events • Sample Space, S • The set of all possible outcomes of an event E • Probability, p • The probability of an event E in a sample space is the ratio of the number of elements in E to the total number of possible outcomes

Elements of Probability Theory • For example, what is the probability that a 7 or an 11 are the result of the roll of two fair dice? • Elementary Event: play two dice • Event: Roll the dice • Sample Space: each die has 6 outcomes, so the total set of outcomes of the two dice is 36

Elements of Probability Theory • The number of combinations of the two dice that can give an 7 is 1,6; 2,5; 3,4; 4,3; 5,2 and 6,1 • So the probability of rolling a 7 is 6/36 • The number of combinations of the two dice that can give an 11 is 5,6; 6,5 • So the probability of rolling a 11 is 2/36 • Thus, the probability to the answer is 6/36 + 2/36 =2/9

Probability Reasoning • Suppose you are driving the interstate highway and realize you are gradually slowing down because of increase traffic congestion • The you access to the state highway statistics and download the relevant statistical information

Probability Reasoning • In this situation, we have 3 parameter • Slowing down (S): T or F • Whether or not there’s an accident (A): T or F • Whether or not there’s a road construction (C): T of F

Probability Reasoning • We may also present it in the traditional Venn Diagram

Elements of Probability Theory • Two events A and B are independent if and only if the probability of their both occurring is equal to the product o their occurring individually • P(A B) = P(A) * P(B)

Elements of Probability Theory • Consider the situation where bit strings of length 4 are randomly generated • We want to know whether the event of the bit sting containing an even number of 1s is independent of the event where the bit string ends with 0 • We know the total space is 2^4 = 16

Elements of Probability Theory • There are 8 bit strings of length four that end with 0 • There are 8 bit strings of length four that have even number of 1’s • The number of bit strings that have both an even number of 1s and end with 0 is 4: {1100, 1010, 0110, 0000}

Elements of Probability Theory • P({even number of 1s} {end with 0})=p({even number of 1s}) * p({end with 0}) • 4/16=8/16*8/16

Probability Reasoning • Finally, the conditional probability • p(d|s) = |d s|/|s|

Probability Reasoning

Bayes’ Theorem • P(A), P(B) is the prior probability • P(A|B) is the conditional probability of A, given B. • P(B|A) is the conditional probability of B, given A.

Bayes’ Theorem • Suppose there is a school with 60% boys and 40% girls as its students. • The female students wear trousers (50%) or skirts (50%) in equal numbers; the boys all wear trousers. • An observer sees a (random) student from a distance, and what the observer can see is that this student is wearing trousers. • What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem

Bayes’ Theorem • P(B|A), or the probability of the student wearing trousers given that the student is a girl. Since girls are as likely to wear skirts as trousers, this is 0.5. • P(A), or the probability that the student is a girl regardless of any other information, this probability equals 0.4. • P(B), or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since half of the girls and all of the boys are wearing trousers, this is 0.5×0.4 + 1.0×0.6 = 0.8.

Bayes’ Theorem

Naïve Bayesian Classifier: Training Dataset Class: C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)

Bayesian Theorem: Basics Let X be a data sample Let H be a hypothesis (our prediction) that X belongs to class C Classification is to determine P(H|X), the probability that the hypothesis holds given the observed data sample X Example: customer X will buy a computer given that know the customer’s age and income

Naïve Bayesian Classifier: Training Dataset Class: C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)

Naïve Bayesian Classifier: An Example P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer = “no”) = 5/14= 0.357 Compute P(X|Ci) for each class P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

Naïve Bayesian Classifier: An Example X = (age <= 30 , income = medium, student = yes, credit_rating = fair) P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007 Therefore, X belongs to class (“buys_computer = yes”)

Towards Naïve Bayesian Classifier This can be derived from Bayes’ theorem Since P(X) is constant for all classes, only needs to be maximized

Naïve Bayesian Classifier: An Example Test on the following example: X = (age > 30, Income = Low, Student = yes Credit_rating = Excellent)

Tomato • You say [t ow m ey t ow] and I say [t ow m aa t ow] • Probabilistic finite machine • A finite state machine where the next state function is a probability distribution over the full set of states of the machine • Probabilistic finite state acceptor • An acceptor, whene one or more states are indicates as the start state and one or more as the accept states

So how is “Tomato” pronounced • A probabilistic finite state acceptor for the pronunciation of “tomato”, adapted from Jurafsky and Martin (2000).

Natural Language Processing • IN the second example, we consider the phoneme recognition problem, • Often called decoding • Suppose a phoneme recognition algorithm has identified the phone ni (as in “knee”) that occurs just after the recognized word I

Natural Language Processing • We want to associate ni with either a word or the first part of the word • Then we need Switchboard Corpora, which is 1.4M word collection of telephone conversation,to assist us.

Natural Language Processing

Natural Language Processing • We next apply a form of Naïve Baye’s theorem to analysis the phone ni following I

Ch5 Stochastic Methods

Ch5 Stochastic Methods

Presentation Transcript

Ch5 .COLLOCATIONS

Metamaterials CH5

Semi-Stochastic Gradient Descent Methods

Semi-Stochastic Gradient Descent Methods

Stochastic Methods

Stochastic Methods

Ch5 Sec3

Ch5 Sec2

SQL1-ch5

Stochastic Methods The Power of Numbers

Stochastic Methods

Extensions of mean-field with stochastic methods

Sample Approximation Methods for Stochastic Program

Ch5 Database Integrity

Ch5: Bisectors

CH5. TREES

Stochastic Methods

Code Ch5