1 / 22

Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014

Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014. Diana Trandabat. Intro to probabilities. Probability deals with prediction : Which word will follow in this ....? How can parses for a sentence be ordered? Which meaning is more likely?

kitty
Download Presentation

Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical NLPCourse for Master in Computational Linguistics2nd Year2013-2014 Diana Trandabat

  2. Intro to probabilities • Probability deals with prediction: • Which word will follow in this ....? • How can parses for a sentence be ordered? • Which meaning is more likely? • Which grammar is more linguistically plausible? • See phrase “more lies ahead”. How likely is it that “lies” is noun? • See “Le chien est noir”. How likely is it that the correct translation is “The dog is black”? • Any rational decision can be described probabilistically.

  3. Notations • Experiment (or trial) – repeatable process by which observations are made • e.g. tossing 3 coins • Observe basic outcome from sample space, Ω, (set of all possible basic outcomes) • Examples of sample spaces: • one coin toss, sample space Ω = { H, T }; • three coin tosses, Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} • part-of-speech of a word, Ω = {N, V, Adj, etc…} • next word in Shakespeare play, |Ω| = size of vocabulary • number of words in your Msc. Thesis Ω = { 0, 1, … ∞ }

  4. Notation • An event A, is a set of basic outcomes, i.e., a subset of the sample space, Ω. Example: – Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – e.g. basic outcome = THH – e.g. event = “has exactly 2 H’s” A={THH, HHT, HTH} – A=Ω is the certain event P(A=Ω)=1 –A=∅is the impossible eventP(A=∅) = 0 – For “not A” , we write Ā

  5. Intro to probablities

  6. Intro to probablities • The probability of an event is hard to compute. • It is easily to compute the estimation of probability, marked ^p(x). • When |X| , ^p(x)  P(x)

  7. Intro to probabilities • “A coin is tossed 3 times. • What is the likelihood of 2 heads?” – Experiment: Toss a coin three times – Sample space Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – Event: basic outcomes that have exactly 2 H’s A = {THH, HTH, HHT} –the likelihood of 2 heads is 3 out of 8 possible outcomes P(A) = 3/8

  8. Probability distribution • A probability distribution is an assignment of probabilities from a set of outcomes. • A uniform distribution assigns the same probability to all outcomes (eg a fair coin). • A gaussian distribution assigns a bell-curve over outcomes. • Many others. • Uniform and gaussians popular in SNLP.

  9. Joint probabilities

  10. Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”.

  11. Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. • X={2, 4, 6}, Y={3, 6}

  12. Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. • X={2, 4, 6}, Y={3, 6} • p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 • p(Y)=p(3)+p(6)=1/3

  13. Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. • X={2, 4, 6}, Y={3, 6} • p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 • p(Y)=p(3)+p(6)=1/3 • p(X,Y)=p(6)=1/2*1/3=p(X)*p(Y)=1/6 • ==> X and Y are independents

  14. Conditioned events • Non independent events are called conditioned events. • p(X|Y) == “the probability of having X if an Y event occurred. • p(X|Y)=p(X,Y) /p(Y) • p(X) == apriori probability(prior) • p(X|Y) = posterior probability

  15. Conditioned events

  16. Are X and Y independent? p(X)=1/2, p(Y)=1/3, p(X,Y)=1/6, p(X |Y)= 1/2 ==> independent. • Consider Z the event “the number on the dice can be divided by 4” Are X and Z independent? p(Z)=p(4)=1 /6 p(X,Z)=1/6, p(X|Z)=p(X,Z) / p(Z)=1/6 /1/6=11/2 ==> non-indep.

  17. Bayes’ Theorem • Bayes’ Theorem lets us swap the order of dependence between events • We saw that • Bayes’ Theorem:

  18. Example • S:stiff neck, M: meningitis • P(S|M) =0.5, P(M) = 1/50,000 P(S)=1/20 • I have stiff neck, should I worry?

  19. Example • S:stiff neck, M: meningitis • P(S|M) =0.5, P(M) = 1/50,000 P(S)=1/20 • I have stiff neck, should I worry?

  20. Other useful relations: p(x)=p(x|y) *p(y) or p(x)=p(x,y) yY yY Chain rule: p(x1,x2,…xn) = p(x1)*p(x2| x1 )*p(x3| x1,x2)*... p(xn| x1,x2 ,…xn-1) The demonstration is easy, through successive reductions: Consider event y as coincident of events x1,x2 ,…xn-1 p(x1,x2,…xn)= p(y, xn)=p(y)*p(xn| y)= p(x1,x2 ,…xn-1)*p(xn | x1,x2 ,…xn-1) similar for the event z p(x1,x2,…xn-1)= p(z, xn-1)=p(z)*p(xn -1| z)= p(x1,x2 ,…xn-2)*p(xn -1| x1,x2 ,…xn-2) . . . p(x1,x2,…xn)=p(x1)*p(x2| x1 )*p(x3| x1,x2)*... p(xn| x1,x2 ,…xn-1) prior bigram, trigram, n-gram

  21. Objections • People don’t compute probabilities. • Why would computers? • Or do they? • John went to … the market go red if number

  22. Objections • Statistics only count words and co-occurrences • Two different concepts: • Statistical model and statistical method • The first doesn’t need the second one. • A person which used the intuition to raison is using a statistical model without statistical methods. • Objections refer mainly to the accuracy of statistical models.

More Related