1 / 55

Advanced Artificial Intelligence

Advanced Artificial Intelligence. Part II. Statistical NLP. Introduction and Grammar Models Wolfram Burgard, Luc De Raedt , Bernhard Nebel, Kristian Kersting. Some slides taken from Helmut Schmid, Rada Mihalcea , Bonnie Dorr, Leila Kosseim, Peter Flach and others. Topic.

cid
Download Presentation

Advanced Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Artificial Intelligence Part II. Statistical NLP Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting Some slides taken from Helmut Schmid, Rada Mihalcea,Bonnie Dorr, Leila Kosseim, Peter Flach and others

  2. Topic • Statistical Natural Language Processing • Applies • Machine Learning / Statistics to • Learning : the ability to improve one’s behaviour at a specific task over time - involves the analysis of data (statistics) • Natural Language Processing • Following parts of the book • Statistical NLP (Manning and Schuetze), MIT Press, 1999.

  3. Contents • Motivation • Zipf’s law • Some natural language processing tasks • Non-probabilistic NLP models • Regular grammars and finite state automata • Context-Free Grammars • Definite Clause Grammars • Motivation for statistical NLP • Overview of the rest of this part

  4. Rationalism versus Empiricism • Rationalist • Noam Chomsky - innate language structures • AI : hand coding NLP • Dominant view 1960-1985 • Cf. e.g. Steven Pinker’s The language instinct. (popular science book) • Empiricist • Ability to learn is innate • AI : language is learned from corpora • Dominant 1920-1960 and becoming increasingly important

  5. Rationalism versus Empiricism • Noam Chomsky: • But it must be recognized that the notion of “probability of a sentence” is an entirely useless one, under any known interpretation of this term • Fred Jelinek (IBM 1988) • Every time a linguist leaves the room the recognition rate goes up. • (Alternative: Every time I fire a linguist the recognizer improves)

  6. This course • Empiricist approach • Focus will be on probabilistic models for learning of natural language • No time to treat natural language in depth ! • (though this would be quite useful and interesting) • Deserves a full course by itself • Covered in more depth in Logic, Language and Learning (SS 05, prob. SS 06)

  7. Ambiguity

  8. NLP and Statistics • Statistical Disambiguation • Define a probability model for the data • Compute the probability of each alternative • Choose the most likely alternative

  9. NLP and Statistics Statistical Methods deal with uncertainty.They predict the future behaviour of a systembased on the behaviour observed in the past.  Statistical Methods require training data. The data in Statistical NLP are the Corpora

  10. Corpora • Corpus: text collection for linguistic purposes • TokensHow many words are contained in Tom Sawyer? 71.370 • TypesHow many different words are contained in T.S.? 8.018 • Hapax Legomenawords appearing only once

  11. Word Counts  The most frequent words are function words

  12. Word Counts f nf 1 3993 2 1292 3 664 4 410 5 243 6 199 7 172 8 131 9 82 10 91 11-50 540 51-100 99 > 100 102 How many words appear f times? About half of the words occurs just once About half of the text consists of the 100 most common words ….

  13. Word Counts (Brown corpus)

  14. Word Counts (Brown corpus)

  15. Zipf‘s Law word f r f*r word f r f*r the 3332 1 3332 turned 51 200 10200 and 2972 2 5944 you‘ll 30 300 9000 a 1775 3 5235 name 21 400 8400 he 877 10 8770 comes 16 500 8000 but 410 20 8400 group 13 600 7800 be 294 30 8820 lead 11 700 7700 there 222 40 8880 friends 10 800 8000 one 172 50 8600 begin 9 900 8100 about 158 60 9480 family 8 1000 8000 more 138 70 9660 brushed 4 2000 8000 never 124 80 9920 sins 2 3000 6000 Oh 116 90 10440 Could 2 4000 8000 two 104 100 10400 Applausive 1 8000 8000 Zipf‘s Law: f~1/r (f*r = const) Minimize effort

  16. Language and sequences • Natural language processing • Is concerned with the analysis of sequences of words / sentences • Construction of language models • Two types of models • Non-probabilistic • Probabilistic

  17. Key NLP Problem: Ambiguity • Human Language is highly ambiguous at all levels • acoustic levelrecognize speech vs. wreck a nice beach • morphological levelsaw: to see (past), saw (noun), to saw (present, inf) • syntactic levelI saw the man on the hill with a telescope • semantic levelOne book has to be read by every student

  18. Language Model • A formal model about language • Two types • Non-probabilistic • Allows one to compute whether a certain sequence (sentence or part thereof) is possible • Often grammar based • Probabilistic • Allows one to compute the probability of a certain sequence • Often extends grammars with probabilities

  19. Example of bad language model

  20. A bad language model

  21. A bad language model

  22. A good language model • Non-Probabilistic • “I swear to tell the truth” is possible • “I swerve to smell de soup” is impossible • Probabilistic • P(I swear to tell the truth) ~ .0001 • P(I swerve to smell de soup) ~ 0

  23. Why language models ? • Consider a Shannon Game • Predicting the next word in the sequence • Statistical natural language …. • The cat is thrown out of the … • The large green … • Sue swallowed the large green … • … • Model at the sentence level

  24. Applications • Spelling correction • Mobile phone texting • Speech recognition • Handwriting recognition • Disabled users • …

  25. Spelling errors • They are leaving in about fifteen minuets to go to her house. • The study was conducted mainly be John Black. • Hopefully, all with continue smoothly in my absence. • Can they lave him my messages? • I need to notified the bank of…. • He is trying to fine out.

  26. Handwriting recognition • Assume a note is given to a bank teller, which the teller reads as I have a gub. (cf. Woody Allen) • NLP to the rescue …. • gub is not a word • gun, gum, Gus, and gull are words, but gun has a higher probability in the context of a bank

  27. For Spell Checkers • Collect list of commonly substituted words • piece/peace, whether/weather, their/there ... • Example:“On Tuesday, the whether …’’“On Tuesday, the weather …”

  28. Another dimension in language models • Do we mainly want to infer (probabilities) of legal sentences / sequences ? • So far • Or, do we want to infer properties of these sentences ? • E.g., parse tree, part-of-speech-tagging • Needed for understanding NL • Let’s look at some tasks

  29. Sequence Tagging • Part-of-speech tagging • He drives with his bike • N V PR PN N noun, verb, preposition, pronoun, noun • Text extraction • The job is that of a programmer • X X X X X X JobType • The seminar is taking place from 15.00 to 16.00 • X X X X X X StartEnd

  30. Sequence Tagging • Predicting the secondary structure of proteins, mRNA, … • X = A,F,A,R,L,M,M,A,… • Y = he,he,st,st,st,he,st,he, …

  31. Parsing • Given a sentence, find its parse tree • Important step in understanding NL

  32. Parsing • In bioinformatics, allows to predict (elements of) structure from sequence

  33. Language models based on Grammars • Grammar Types • Regular grammars and Finite State Automata • Context-Free Grammars • Definite Clause Grammars • A particular type of Unification Based Grammar (Prolog) • Distinguish lexicon from grammar • Lexicon (dictionary) contains information about words, e.g. • word - possible tags (and possibly additional information) • flies - V(erb) - N(oun) • Grammar encode rules

  34. Grammars and parsing • Syntactic level best understood and formalized • Derivation of grammatical structure: parsing(more than just recognition) • Result of parsing mostly parse tree:showing the constituents of a sentence, e.g. verb or noun phrases • Syntax usually specified in terms of a grammar consisting of grammar rules

  35. Lexical information - which words are ? Det(erminer) N(oun) Vi (intransitive verb) - no argument Pn (pronoun) Vt (transitive verb) - takes an argument Adj (adjective) Now accept The cat slept Det N Vi As regular grammar S -> [Det] S1 % [ ] : terminal S1 -> [N] S2 S2 -> [Vi] Lexicon The - Det Cat - N Slept - Vi … Regular Grammars and Finite State Automata

  36. Finite State Automaton • Sentences • John smiles - Pn Vi • The cat disappeared - Det N Vi • These new shoes hurt - Det Adj N Vi • John liked the old cat PN Vt Det Adj N

  37. Phrase structure S NP VP D N V NP PP P NP N D N D the dog a cat into the garden chased

  38. Notation • S: sentence • D or Det: Determiner (e.g., articles) • N: noun • V: verb • P: preposition • NP: noun phrase • VP: verb phrase • PP: prepositional phrase

  39. Context Free Grammar S -> NP VPNP -> D NVP -> V NPVP -> V NP PPPP -> P NPD -> [the]D -> [a]N -> [dog]N -> [cat]N -> [garden]V -> [chased]V -> [saw]P -> [into] Terminals ~ Lexicon

  40. Phrase structure • Formalism of context-free grammars • Nonterminal symbols: S, NP, VP, ... • Terminal symbols: dog, cat, saw, the, ... • Recursion • „The girl thought the dog chased the cat“ VP -> V, SN -> [girl]V -> [thought]

  41. Top-down parsing • S -> NP VP • S -> Det N VP • S -> The N VP • S -> The dog VP • S -> The dog V NP • S -> The dog chased NP • S -> The dog chased Det N • S-> The dog chased the N • S-> The dog chased the cat

  42. Context-free grammar S --> NP,VP. NP --> PN. %Proper noun NP --> Art, Adj, N. NP --> Art,N. VP --> VI. %intransitive verb VP --> VT, NP. %transitive verb Art --> [the]. Adj --> [lazy]. Adj --> [rapid]. PN --> [achilles]. N --> [turtle]. VI --> [sleeps]. VT --> [beats].

  43. S NP VP Art Adj N Vt NP PN the rapid turtle beats achilles Parse tree

  44. Definite Clause GrammarsNon-terminals may have arguments S --> NP(N),VP(N). NP(N) --> Art(N),N(N). VP(N) --> VI(N). Art(singular) --> [a]. Art(singular) --> [the]. Art(plural) --> [the]. N(singular) --> [turtle]. N(plural) --> [turtles]. VI(singular) --> [sleeps]. VI(plural) --> [sleep]. Number Agreement

  45. DCGs • Non-terminals may have arguments • Variables (start with capital) • E.g. Number, Any • Constants (start with lower case) • E.g. singular, plural • Structured terms (start with lower case, and take arguments themselves) • E.g. vp(V,NP) • Parsing needs to be adapted • Using unification

  46. Unification in a nutshell (cf. AI course) • Substitutions E.g. {Num / singular } {T / vp(V,NP)} • Applying substitution • Simultaneously replace variables by corresponding terms • S(Num) {Num / singular } = S(singular)

  47. Unification • Take two non-terminals with arguments and compute (most general) substitution that makes them identical, e.g., • Art(singular) and Art(Num) • Gives { Num / singular } • Art(singular) and Art(plural) • Fails • Art(Num1) and Art(Num2) • {Num1 / Num2} • PN(Num, accusative) and PN(singular, Case) • {Num/singular, Case/accusative}

  48. Parsing with DCGs • Now require successful unification at each step • S -> NP(N), VP(N) • S -> Art(N), N(N), VP(N) {N/singular} • S -> a N(singular), VP(singular) • S -> a turtle VP(singular) • S -> a turtle sleeps • S-> a turtle sleep fails

  49. Case Marking PN(singular,nominative) --> [he];[she] PN(singular,accusative) --> [him];[her] PN(plural,nominative) --> [they] PN(plural,accusative) --> [them] S --> NP(Number,nominative), NP(Number) VP(Number) --> V(Number), VP(Any,accusative) VP(Number,Case) --> PN(Number,Case) VP(Number,Any) --> Det, N(Number) He sees her. She sees him. They see her. But not Them see he.

  50. DCGs • Are strictly more expressive than CFGs • Can represent for instance • S(N) -> A(N), B(N), C(N) • A(0) -> [] • B(0) -> [] • C(0) -> [] • A(s(N)) -> A(N), [A] • B(s(N)) -> B(N), [B] • C(s(N)) -> C(N), [C]

More Related