1 / 24

CSA4050: Advanced Topics in NLP

CSA4050: Advanced Topics in NLP. Probability I Experiments/Outcomes/Events Independence/Dependence Bayes’ Rule Conditional Probability/Chain Rule. Acknowledgement. Much of this material is based on material by Mary Dalrymple, Kings College, London. Experiment, Basic Outcome, Sample Space.

ulric-boyd
Download Presentation

CSA4050: Advanced Topics in NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA4050:Advanced Topics in NLP Probability I Experiments/Outcomes/Events Independence/Dependence Bayes’ Rule Conditional Probability/Chain Rule CSA4050: Crash Concepts in Probability

  2. Acknowledgement • Much of this material is based on material by Mary Dalrymple, Kings College, London CSA4050: Crash Concepts in Probability

  3. Experiment, Basic Outcome,Sample Space • Probability theory is founded upon the notion of an experiment. • An experiment is a situation which can have one or more different basic outcomes. • Example: if we throw a die, there are six possible basic outcomes. • A Sample SpaceΩis a set of all possible basic outcomes. For example, • If we toss a coin, Ω = {H,T} • If we toss a coin twice, Ω = {HT,TH,TT,HH} • if we throw a die, Ω = {1,2,3,4,5,6} CSA4050: Crash Concepts in Probability

  4. Event • An Event A  Ωis a set of basic outcomes e.g. • tossing two heads {HH} • throwing a 6, {6} • getting either a 2 or a 4, {2,4}. • Ω itself is the certain event, whilst { } is the impossible event. • Event Space ≠ Sample Space CSA4050: Crash Concepts in Probability

  5. Probability distribution • A probability distribution of an experiment is a function that assigns a number (or probability) between 0 and 1 to each basic outcome such that the sum of all the probabilities = 1. • The probability p(E) of an event E is the sum of the probabilities of all the basic outcomes in E. • Uniform distribution is when each basic outcome is equally likely. CSA4050: Crash Concepts in Probability

  6. Probability of an Event: die example • Sample space = set of basic outcomes = {1,2,3,4,5,6} • If the die is not loaded, distribution is uniform. • Thus for each basic outcome, e.g. {6} (throwing a six) is assigned the same probability = 1/6. • So p({3,6}) = p({3}) + p({6}) = 2/6 = 1/3 CSA4050: Crash Concepts in Probability

  7. Estimating Probability • Repeat experiment T times and count frequency of E. • Estimated p(E) = count(E)/count(T) • This can be done over m runs, yielding estimates p1(E),...pm(E). • Best estimate is (possibly weighted) average of individual pi(E) CSA4050: Crash Concepts in Probability

  8. 3 times coin toss • Ω= {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} • Cases with exactly 2 tails = {HTT, THT,TTH} • Experimenti = 1000 cases (3000 tosses). • c1(E)= 386, p1(E) = .386 • c2(E)= 375, p2(E) = .375 • pmean(E)= (.386+.375)/2 = .381 • Uniform distribution is when each basic outcome is equally likely. • Assuming uniform distribution, p(E) = 3/8 = .375 CSA4050: Crash Concepts in Probability

  9. Word Probability • General Problem:What is the probability of the next word/character/phoneme in a sequence, given the first N words/characters/phonemes. • To approach this problem we study an experiment whose sample space is the set of possible words. • N.B. The same approach could be used to study the the probability of the next character or phoneme. CSA4050: Crash Concepts in Probability

  10. Word Probability • Approximation 1: all words are equally probable • Then probability of each word = 1/N where N is the number of word types. • But all words are not equally probable • Approximation 2: probability of each word is the same as its frequency of occurrence in a corpus. CSA4050: Crash Concepts in Probability

  11. Word Probability • Estimate p(w) - the probability of word w: • Given corpus Cp(w)  count(w)/size(C) • Example • Brown corpus: 1,000,000 tokens • the: 69,971 tokens • Probability of the: 69,971/1,000,000  .07 • rabbit: 11 tokens • Probability of rabbit: 11/1,000,000  .00001 • conclusion: next word is most likely to be the • Is this correct? CSA4050: Crash Concepts in Probability

  12. A counter example • Given the context: Look at the cute ... • is the more likely than rabbit? • Context matters in determining what word comes next. • What is the probability of the next word in a sequence, given the first N words? CSA4050: Crash Concepts in Probability

  13. Independent Events A: eggs B: monday sample space CSA4050: Crash Concepts in Probability

  14. Sample Space (eggs,mon) (cereal,mon) (nothing,mon) (eggs,tue) (cereal,tue) (nothing,tue) (eggs,wed) (cereal,wed) (nothing,wed) (eggs,thu) (cereal,thu) (nothing,thu) (eggs,fri) (cereal,fri) (nothing,fri) (eggs,sat) (cereal,sat) (nothing,sat) (eggs,sun) (cereal,sun) (nothing,sun) CSA4050: Crash Concepts in Probability

  15. Independent Events • Two events, A and B, are independent if the fact that A occurs does not affect the probability of B occurring. • When two events, A and B, are independent, the probability of both occurring p(A,B) is the product of the prior probabilities of each, i.e. p(A,B) = p(A) ·  p(B) CSA4050: Crash Concepts in Probability

  16. Dependent Events • Two events, A and B, are dependent if the occurrence of one affects the probability of the occurrence of the other. CSA4050: Crash Concepts in Probability

  17. Dependent Events A A  B B sample space CSA4050: Crash Concepts in Probability

  18. Conditional Probability • The conditional probability of an event A given that event B has already occurred is written p(A|B) • In general p(A|B)  p(B|A) CSA4050: Crash Concepts in Probability

  19. Dependent Events: p(A|B)≠ p(B|A) sample space A A  B B CSA4050: Crash Concepts in Probability

  20. Example Dependencies • Consider fair die example with • A = outcome divisible by 2 • B = outcome divisible by 3 • C = outcome divisible by 4 • p(A|B) = p(A  B)/p(B) = (1/6)/(1/3) = ½ • p(A|C) = p(A  C)/p(C) = (1/6)/(1/6) = 1 CSA4050: Crash Concepts in Probability

  21. Conditional Probability • Intuitively, after B has occurred, event A is replaced by A  B, the sample space Ω is replaced by B, and probabilities are renormalised accordingly • The conditional probability of an event A given that B has occurred (p(B)>0) is thus given by p(A|B) = p(A  B)/p(B). • If A and B are independent,p(A  B) = p(A) · p(B) sop(A|B) = p(A) · p(B) /p(B) = p(A). CSA4050: Crash Concepts in Probability

  22. Bayesian Inversion • For A and B to occur, either B must occur first, then B, or vice versa. We get the following possibilites: p(A|B) = p(A  B)/p(B)p(B|A) = p(A  B)/p(A) • Hence p(A|B) p(B) = p(B|A) p(A) • We can thus express p(A|B) in terms of p(B|A) • p(A|B) = p(B|A) p(A)/p(B) • This equivalence, known as Bayes’ Theorem, is useful when one or other quantity is difficult to determine CSA4050: Crash Concepts in Probability

  23. Bayes’ Theorem • p(B|A) = p(BA)/p(A) = p(A|B) p(B)/p(A) • The denominator p(A) can be ignored if we are only interested in which event out of some set is most likely. • Typically we are interested in the value of B that maximises an observation A, i.e. • arg maxB p(A|B) p(B)/p(A) = arg maxB p(A|B) p(B) CSA4050: Crash Concepts in Probability

  24. The Chain Rule • We can use the definition of conditional probability to more than two events • p(A1  ...  An) = p(A1) * p(A2|A1) * p(A3|A1  A2)..., p(An|A1  ...  An-1) • The chain rule allows us to talk about the probability of sequences of events p(A1,...,An). CSA4050: Crash Concepts in Probability

More Related