Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014

Statistical NLPCourse for Master in Computational Linguistics2nd Year2013-2014 Diana Trandabat

Intro to probabilities • Probability deals with prediction: • Which word will follow in this ....? • How can parses for a sentence be ordered? • Which meaning is more likely? • Which grammar is more linguistically plausible? • See phrase “more lies ahead”. How likely is it that “lies” is noun? • See “Le chien est noir”. How likely is it that the correct translation is “The dog is black”? • Any rational decision can be described probabilistically.

Notations • Experiment (or trial) – repeatable process by which observations are made • e.g. tossing 3 coins • Observe basic outcome from sample space, Ω, (set of all possible basic outcomes) • Examples of sample spaces: • one coin toss, sample space Ω = { H, T }; • three coin tosses, Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} • part-of-speech of a word, Ω = {N, V, Adj, etc…} • next word in Shakespeare play, |Ω| = size of vocabulary • number of words in your Msc. Thesis Ω = { 0, 1, … ∞ }

Notation • An event A, is a set of basic outcomes, i.e., a subset of the sample space, Ω. Example: – Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – e.g. basic outcome = THH – e.g. event = “has exactly 2 H’s” A={THH, HHT, HTH} – A=Ω is the certain event P(A=Ω)=1 –A=∅is the impossible eventP(A=∅) = 0 – For “not A” , we write Ā

Intro to probablities

Intro to probablities • The probability of an event is hard to compute. • It is easily to compute the estimation of probability, marked ^p(x). • When |X| , ^p(x)  P(x)

Intro to probabilities • “A coin is tossed 3 times. • What is the likelihood of 2 heads?” – Experiment: Toss a coin three times – Sample space Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – Event: basic outcomes that have exactly 2 H’s A = {THH, HTH, HHT} –the likelihood of 2 heads is 3 out of 8 possible outcomes P(A) = 3/8

Probability distribution • A probability distribution is an assignment of probabilities from a set of outcomes. • A uniform distribution assigns the same probability to all outcomes (eg a fair coin). • A gaussian distribution assigns a bell-curve over outcomes. • Many others. • Uniform and gaussians popular in SNLP.

Joint probabilities

Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”.

Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. • X={2, 4, 6}, Y={3, 6}

Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. • X={2, 4, 6}, Y={3, 6} • p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 • p(Y)=p(3)+p(6)=1/3

Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. • X={2, 4, 6}, Y={3, 6} • p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 • p(Y)=p(3)+p(6)=1/3 • p(X,Y)=p(6)=1/2*1/3=p(X)*p(Y)=1/6 • ==> X and Y are independents

Conditioned events • Non independent events are called conditioned events. • p(X|Y) == “the probability of having X if an Y event occurred. • p(X|Y)=p(X,Y) /p(Y) • p(X) == apriori probability(prior) • p(X|Y) = posterior probability

Conditioned events

Are X and Y independent? p(X)=1/2, p(Y)=1/3, p(X,Y)=1/6, p(X |Y)= 1/2 ==> independent. • Consider Z the event “the number on the dice can be divided by 4” Are X and Z independent? p(Z)=p(4)=1 /6 p(X,Z)=1/6, p(X|Z)=p(X,Z) / p(Z)=1/6 /1/6=11/2 ==> non-indep.

Bayes’ Theorem • Bayes’ Theorem lets us swap the order of dependence between events • We saw that • Bayes’ Theorem:

Example • S:stiff neck, M: meningitis • P(S|M) =0.5, P(M) = 1/50,000 P(S)=1/20 • I have stiff neck, should I worry?

Other useful relations: p(x)=p(x|y) *p(y) or p(x)=p(x,y) yY yY Chain rule: p(x1,x2,…xn) = p(x1)*p(x2| x1 )*p(x3| x1,x2)*... p(xn| x1,x2 ,…xn-1) The demonstration is easy, through successive reductions: Consider event y as coincident of events x1,x2 ,…xn-1 p(x1,x2,…xn)= p(y, xn)=p(y)*p(xn| y)= p(x1,x2 ,…xn-1)*p(xn | x1,x2 ,…xn-1) similar for the event z p(x1,x2,…xn-1)= p(z, xn-1)=p(z)*p(xn -1| z)= p(x1,x2 ,…xn-2)*p(xn -1| x1,x2 ,…xn-2) . . . p(x1,x2,…xn)=p(x1)*p(x2| x1 )*p(x3| x1,x2)*... p(xn| x1,x2 ,…xn-1) prior bigram, trigram, n-gram

Objections • People don’t compute probabilities. • Why would computers? • Or do they? • John went to … the market go red if number

Objections • Statistics only count words and co-occurrences • Two different concepts: • Statistical model and statistical method • The first doesn’t need the second one. • A person which used the intuition to raison is using a statistical model without statistical methods. • Objections refer mainly to the accuracy of statistical models.

Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014

Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014

Presentation Transcript

Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics

Statistische Methoden in der Computerlinguistik Statistical Methods in Computational Linguistics

Statistical Linguistics

Computational Linguistics

Statistical Linguistics

Role of NLP in Linguistics

Computational linguistics

Statistical techniques in NLP

Computational linguistics

Statistical methods in NLP

Computational linguistics

Statistical methods in NLP

Statistical Methods in NLP Course 10

Computational Linguistics @ UIUC

Computational Linguistics

Welcome to: 2nd year MEDfOR Master course In Padova

Computational Linguistics

Statistical Language Modeling (SLM); Computational Linguistics (CL)

Statistical Methods in NLP Course 7 Diana Trandabăț 2013-2014

Certified NLP Master Practitioner Training Course In India.

NLP Linguistics 101