1 / 56

CPSC 7373: Artificial Intelligence Lecture 13: Natural Language Processing

CPSC 7373: Artificial Intelligence Lecture 13: Natural Language Processing. Jiang Bian, Fall 2012 University of Arkansas at Little Rock. Natural Language Processing. Understanding natural languages:

hinto
Download Presentation

CPSC 7373: Artificial Intelligence Lecture 13: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 7373: Artificial IntelligenceLecture 13: Natural Language Processing Jiang Bian, Fall 2012 University of Arkansas at Little Rock

  2. Natural Language Processing • Understanding natural languages: • Philosophically: We—human—have defined ourselves in terms of our ability to speak with and understand each other. • Application-wise: We want to be able to talk to the computers. • Learning: We want the computers to be smarter, and learn human knowledge from text-books.

  3. Language Models • Two types of language models • Represented as a sequence of letters/words. • Probabilistic: the probability of a sequence: P(word1, word2, …) • Mostly are word-based; and • Learned from data • Trees and abstract structure of words. • Logical: L = {S1, S2, …} • Abstraction: trees/categories • Hand-coded S NP VP Name Verb Sam Slept

  4. Bag of Words HONK OF • A bag rather than a sequence • Unigram, Naïve Bayes model: • Each individual word is treated as a separate factor that unrelated or unconditionally independent of all the other words. • Possible to take the sequence into account. MODEL THE IF YOU LOVE WORDS BAG

  5. Probabilistic Models • P(w1 w2 w3 … wn) = P(W1:n) • = ∏iP(wi|w1:i-1) • Markov Assumption: • the effect of one variable on another will be local; • the nth word is only relevant to its previous k words. • P(wi|w1:i-1) = P(wi|wi-k:i-1) • For first-order Markov model: P(wi|wi-1) • Stationary Assumption: • the probability of each variable is the same • i.e., the word probability only depends on its surrounding words in a sentence, but does not depend on which sentence I am saying… • P(wi|wi-1)=P(wj|wj-1)

  6. Applications of Language Models • Classification (e.g., spam) • Clustering (e.g., news stories) • Input correction (spelling, segmentation) • Sentiment analysis (e.g., product reviews) • Information retrieval (e.g., web search) • Question answering (e.g., IBM’s Watson) • Machine translation (e.g., Chinese to English) • Speech recognition (e.g., Apple’s Siri)

  7. N-gram Model • Ann-gram is a contiguous sequence of n items from a given sequence of text or speech. • Language Models (LM) • Unigrams, Bigrams, Trigrams… • Applications: • Speech recognition/data compression • Predict the next word • Information Retrieve • Retrieved documents are ranked based on the probability of the query and the document’s language model • P(Q|Md)

  8. N-gram examples • S = “I saw the red house” • Unigram: • P(S) = P(I, saw, the, red, house) = P(I)P(saw)P(the)P(red)P(house) • Bigram – Markov assumption • P(S) = P(I|<s>)P(saw|I)P(the|saw)P(red|the)P(house|red)P(</s>|house) • Trigram: • P(S) = P(I|<s>, <s>)P(saw|<s>, I)P(the|I, saw)P(red|saw, the)P(house|the, red)P(</s>|red, house)

  9. How do we train these models? • Very large corpora: collections of text and speech • Shakespeare • Brown Corpus • Wall Street Journal • AP newswire • Hansards • Timit • DARPA/NIST text/speech corpora (Call Home, Call Friend, ATIS, Switchboard, Broadcast News, Broadcast Conversation, TDT, Communicator) • TRAINS, Boston Radio News Corpus

  10. A Simple Bigram Example • Estimate the likelihood of the sentence I want to eat Chinese food. • P(I want to eat Chinese food) = P(I | <start>) P(want | I) P(to | want) P(eat | to) P(Chinese | eat) P(food | Chinese) P(<end>|food) • What do we need to calculate these likelihoods? • Bigram probabilities for each word pair sequence in the sentence • Calculated from a large corpus

  11. Eat on .16 Eat Thai .03 Eat some .06 Eat breakfast .03 Eat lunch .06 Eat in .02 Eat dinner .05 Eat Chinese .02 Eat at .04 Eat Mexican .02 Eat a .04 Eat tomorrow .01 Eat Indian .04 Eat dessert .007 Eat today .03 Eat British .001 Early Bigram Probabilities from BERP

  12. <start> I .25 Want some .04 <start> I’d .06 Want Thai .01 <start> Tell .04 To eat .26 <start> I’m .02 To have .14 I want .32 To spend .09 I would .29 To be .02 I don’t .08 British food .60 I have .04 British restaurant .15 Want to .65 British cuisine .01 Want a .05 British lunch .01

  13. P(I want to eat British food) = P(I|<start>) P(want|I) P(to|want) P(eat|to) P(British|eat) P(food|British) = .25*.32*.65*.26*.001*.60 = .000080 • Suppose P(<end>|food) = .2? • How would we calculate I want to eat Chinese food ? • Probabilities roughly capture ``syntactic'' facts and ``world knowledge'' • eat is often followed by an NP • British food is not too popular • N-gram models can be trained by counting and normalization

  14. I Want To Eat Chinese Food lunch I 8 1087 0 13 0 0 0 Want 3 0 786 0 6 8 6 To 3 0 10 860 3 0 12 Eat 0 0 2 0 19 2 52 Chinese 2 0 0 0 0 120 1 Food 19 0 17 0 0 0 0 Lunch 4 0 0 0 0 1 0 Early BERP Bigram Counts

  15. I Want To Eat Chinese Food Lunch 3437 1215 3256 938 213 1506 459 Early BERP Bigram Probabilities • Normalization: divide each row's counts by appropriate unigram counts for wn-1 • Computing the bigram probability of I I • C(I,I)/C( I in call contexts ) • p (I|I) = 8 / 3437 = .0023 • Maximum Likelihood Estimation (MLE): relative frequency

  16. What do we learn about the language? • What's being captured with ... • P(want | I) = .32 • P(to | want) = .65 • P(eat | to) = .26 • P(food | Chinese) = .56 • P(lunch | eat) = .055 • What about... • P(I | I) = .0023 • P(I | want) = .0025 • P(I | food) = .013

  17. P(I | I) = .0023 I I I I want • P(I | want) = .0025 I want I want • P(I | food) = .013 the kind of food I want is ...

  18. Approximating Shakespeare • Generating sentences with random unigrams... • Every enter now severally so, let • Hill he late speaks; or! a more to leg less first you enter • With bigrams... • What means, sir. I confess she? then all sorts, he is trim, captain. • Why dost stand forth thy canopy, forsooth; he is this palpable hit the King Henry. • Trigrams • Sweet prince, Falstaff shall die. • This shall forbid it should be branded, if renown made it empty.

  19. Quadrigrams • What! I will go seek the traitor Gloucester. • Will you not tell me who I am? • What's coming out here looks like Shakespeare because it is Shakespeare • Note: As we increase the value of N, the accuracy of an n-gram model increases, since choice of next word becomes increasingly constrained

  20. N-Gram Training Sensitivity • If we repeated the Shakespeare experiment but trained our n-grams on a Wall Street Journal corpus, what would we get? • Note: This question has major implications for corpus selection or design

  21. WSJ is not Shakespeare: Sentences Generated from WSJ

  22. Probabilistic Letter Models • The probability of a sequence of letters. • What can we do with letter models? • Language identification

  23. Language Identification Bigram Model:

  24. Language Identification Trigram Model:

  25. Classification Naïve Bayes, k-Nearest Neighbor, Support Vector Machine, Logistic Regression GzipCommond???

  26. Gzip • EN • Hello world! • This is a file full of English words… • AZ • Salam Dunya! • Bu faylAzƏrbaycan tam sozlƏr… • DE • Hallo Welt! • Dies isteineDateivoll von deutschenWorte … This is a new piece of text to be classified. (echo `cat new EN | gzip | wc –c` EN; \ echo `cat new DE | gzip | wc –c` DE; \ echo `cat new AZ | gzip | wc –c` AZ; \ | sort –n | head -1

  27. Segmentation • Given a sequence of words, how to break it up into meaningful segments. • e.g., 羽西中国新锐画家大奖 • Written English has spaces in between words: • e.g., words have spaces • Speech Recognition • URL: choosespain.com • Choose Spain • Chooses pain

  28. Segmentation • The best segmentation is the one that maximizes the joint probability of the segmentation. • S* = max P(w1:n) = max ∏iP(wi|w1:i-1) • Markov assumption: • S* ≈ max∏iP(wi|wi-1) • Naïve Bayes assumption: words don’t depend on each other • S* ≈ maxP(wi)

  29. Segmentation • “nowisthetime”: 12 letters • How many possible segmentations? • n-1 • (n-1)^2 • (n-1)! • 2^(n-1) • Naïve Bayes assumption: • S* = argmaxs=f+rP(f)P(S*(r)) • 1) Computationally easy • 2) Learning is easier: it’s easier to calculate the unigram probabilities

  30. Best Segmentation • S* = argmaxs=f+rP(f)P(S*(r)) • “nowisthetime”

  31. Segmentation Examples • Trained on 4 billions words corpus • e.g., • Baseratesoughtto • Base rate sought to • Base rates ought to • smallandinsignificant • small and in significant • small and insignificant • Ginormousego • G in or mouse go • Ginormous ego What to do to improve? More data ??? Markov assumption ??? Smoothing ???

  32. Spelling Correction • Given a misspelled the word, find the best correction: • C* = argmaxcP(c|w) • Bayes theorem: C* = argmaxcP(w|c)P(c) • P(c) = from data counts • P(w|c) = from spelling correction data

  33. Spelling Data • c:w => P(w|c) • pulse: pluse • elegant: elagent, elligit • second: secand, sexeon, secund, seconnd, seond, sekon • sailed: saled, saild • blouse: boludes • thunder: thounder • cooking: coking, chocking, kooking, cocking • fossil: fosscil • We cannot have all the common misspelling cases. • Letter-based models, e.g., • ul:lu

  34. Correction Example • w = “thew” => P(w|c)P(c)

  35. Sentence Structure • P(Fed raises interest rates) = ??? S S VP VP N N V N N N V N NP NP NP NP Fed raises interest rates Fed raises interest rates

  36. Context Free Grammar Parsing • Sentence structure trees are constructed according to grammar. • A grammar is a list of rules: e.g., • S -> NP VP • NP -> N | D (determiners: e.g., the, a) N | NN | NNN (mortgage interest rates), etc. • VP -> V NP | V | V NP NP (e.g., give me the money) • N -> interest | Fed | rates | raises • V -> interest | rates | raises • D -> the | a

  37. Ambiguity How many parsing options do I have? ? ? • The Fed raises interest rates • The Fed raises raises • Raises raisesinterest raises

  38. Ambiguity How many parsing options do I have? ? ? • The Fed raises interest rates (2) • The Fed (NP) raises (V) interest rates (NP) • The Fed raises (NP) interest (V) rates (NP) • The Fed raises raises (1) • The Fed (NP) raises (V) raises (NP) • Raises raisesinterest raises • Raises (NP) raises (V) interest raises (NP) • Raises (NP) raises (V) interest (NP) raises (NP) • Raises raises(NP) interest (V) raises (NP) • Raises raisesinterest (NP) raises (V)

  39. Problems and Solutions Problems: Solutions:

  40. Problems and Solutions Problems: Solutions:

  41. Problems of writing grammars • Natural languages are messy unorganized things evolved through the human history in variety contexts. • It is naturally hard to specify a set of grammar rules that can comprehend all possibilities with out introduce errors. • Ambiguity is the “enemy”…

  42. Probabilistic Context-Free Grammar • S -> NP VP (1) • NP -> • | N (.3) • | DN (.4) • | NN (.2) • | NNN (.1) • VP -> • | V NP (.4) • | V (.4) • | V NP NP (.2) • N -> • | interest (.3) • | Fed (.3) • | rates (.3) • | raises (.1) • V -> • | interest (.1) • | rates (.3) • | raises (.6) • D -> • | the (.5) • | a (.5)

  43. Probabilistic Context-Free Grammar • S -> NP VP (1) • NP -> • | N (.3) • | DN (.2) • | NN (.2) • | NNN (.1) • VP -> • | V NP (.4) • | V (.4) • | V NP NP (.2) • N -> • | interest (.3) • | Fed (.3) • | rates (.3) • | raises (.1) • V -> • | interest (.1) • | rates (.3) • | raises (.6) • D -> • | the (.5) • | a (.5) S VP N V N N NP NP 1 Fed raises interest rates .4 P() = 0.0003888 0.039% .3 .2 .3 .6 .3 .3

  44. Probabilistic Context-Free Grammar • S -> NP VP (1) • NP -> N (.3) | DN (.2) | NN (.2) | NNN (.1) • VP -> V NP (.4) | V (.4) | V NP NP (.2) • N -> interest (.3) | Fed (.3) | rates (.3) | raises (.1) • V -> interest (.1) | rates (.3) | raises (.6) • D -> the (.5) | a (.5) N V N N N N N V S S P() = ???% P() = ???% VP VP NP NP NP NP Raises raises interest rates Raises raises interest rates

  45. Probabilistic Context-Free Grammar • S -> NP VP (1) • NP -> N (.3) | DN (.2) | NN (.2) | NNN (.1) • VP -> V NP (.4) | V (.4) | V NP NP (.2) • N -> interest (.3) | Fed (.3) | rates (.3) | raises (.1) • V -> interest (.1) | rates (.3) | raises (.6) • D -> the (.5) | a (.5) N V N N N N N V S S P() = .012% P() = .00072% VP VP NP NP NP NP Raises raises interest rates Raises raises interest rates

  46. Statistical Parsing • Where are these probabilities coming from? • Training from large annotated corpus • e.g., The Penn Treebank Project (1990): The Penn Treebank Project annotates naturally-occuring text for linguistic structure. • S -> NP VP (1) • NP -> N (.3) | DN (.2) | NN (.2) | NNN (.1) • VP -> V NP (.4) | V (.4) | V NP NP (.2) • N -> interest (.3) | Fed (.3) | rates (.3) | raises (.1) • V -> interest (.1) | rates (.3) | raises (.6) • D -> the (.5) | a (.5)

  47. The Penn Treebank Project • ( (S • (NP-SBJ (NN Stock-market) (NNS tremors) ) • (ADVP-TMP (RB again) ) • (VP (VBD shook) • (NP (NN bond) (NNS prices) ) • (, ,) • (SBAR-TMP (IN while) • (S • (NP-SBJ (DT the) (NN dollar) ) • (VP (VBD turned) • (PRT (RP in) ) • (NP-PRD (DT a) (VBN mixed) (NN performance) ))))) • (. .) ))

  48. Resolving Ambiguity • Ambiguity: • Syntactical – more than one possible structure for the same string of words. • e.g., We need more intelligent leaders. • need more or more intelligent? • lexical (homonymity) – a word form has more than one meaning. • e.g., Did you see the bat? • e.g., Where is the bank?

  49. “The boy saw the man with the telescope” S NP VP Det N V NP PP Det N P NP Det N The boy saw the man with the telescope

  50. “The boy saw the man with the telescope” S NP VP Det NP N V Det N PP P NP Det N The boy saw the man with the telescope

More Related