1 / 68

Computational Lexicology, Morphology and Syntax

Computational Lexicology, Morphology and Syntax. Diana Trandab ăț 2013-2014. NLP with Regular Expressions. Transcript with Eliza, simulation of a Rogerian psychotherapist (Weizenbaum, 1966). User: Men are all alike ELIZA: IN WHAT WAY

kyrie
Download Presentation

Computational Lexicology, Morphology and Syntax

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Lexicology, Morphology and Syntax Diana Trandabăț 2013-2014

  2. NLP with Regular Expressions Transcript with Eliza, simulation of a Rogerian psychotherapist (Weizenbaum, 1966) User: Men are all alike ELIZA: IN WHAT WAY User: They’re always bugging us about something or other ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE? User: Well, my boyfriend made me come here ELIZA: YOUR BOYFRIEND MADE YOU COME HERE User: He says I’m depressed much of the time ELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSED

  3. How did it work? • .* all .* →IN WHAT WAY • .* always .* →CAN YOU THINK OF A SPECIFIC EXAMPLE • .* I’m (depressed|sad) .* → I AM SORRY TO HEAR YOU ARE \1 • .* I’m (depressed|sad) .* →WHY DO YOU THINK YOU ARE \1?

  4. Aside… • What is intelligence? • What does Eliza tell us about intelligence?

  5. The sentence as a string of wordsE.g I saw the lady with the binoculars string = a b c d e b f

  6. The relations of parts of a string to each other may be different I saw the lady with the binoculars is stucturally ambiguous Who has the binoculars?

  7. [I] saw the lady [ with the binoculars]= [a] b c d [e b f]I saw[ the lady with the binoculars]= a b [c d e b f]

  8. How can we represent the difference? By assigning them different structures. We can represent structures with 'trees'. I read the book

  9. a. I saw the lady with the binoculars S NPVPVNPNP PP I saw the ladywith the binocularsI saw [the lady with the binoculars]

  10. b. I saw the lady with the binoculars S NPVPVP PP Isaw the ladywith the binocularsI[ saw the lady ] with the binoculars

  11. birdsfly S NP VP N V birdsfly S → NP VP NP → N VP → V Syntactic rules

  12. S NP VP birdsfly a b ab = string

  13. S A B a b ab S → A B A → a B → b

  14. Rules Assumption: natural language grammars are a rule-based systems What kind of grammars describe natural language phenomena? What are the formal properties of grammatical rules?

  15. The Chomsky Hierarchy

  16. Chomsky (1957) Syntactic Structures. The Hague: Mouton Chomsky, N. and G.A. Miller (1958) Finite-state languages Information and Control 1, 99-112 Chomsky (1959) On certain formal properties of languages. Information and Control 2, 137-167

  17. Rules in Linguistics1.PHONOLOGY /s/ → [θ]  V ___VRewrite /s/ as [θ] when /s/ occurs in context V ____ VWith:V = auxiliary nodes, θ = terminal nodes

  18. Rules in Linguistics2.SYNTAXS → NP VPVP → VNP → NRewrite S as NP VP in any contextWith:S, NP, VP= auxiliary nodesV, N = terminal node

  19. SYNTAX (phrase/sentence formation) sentence: The boy kissed the girl Subject predicate noun phrase verb phrase art + noun verb + noun phrase S → NP VP VP → V NP NP → ART N

  20. Chomsky Hierarchy 0. Type 0 (recursively enumerable) languages Only restrictionon rules: left-hand side cannot be the empty string (* Ø …….) 1. Context-Sensitive languages - Context-Sensitive (CS) rules 2. Context-Free languages - Context-Free (CF) rules 3. Regular languages - Non-Context-Free (CF) rules 0 ⊇ 1⊇ 2 ⊇ 3 a⊇b meaning a properly includes b (aisasupersetofb), i.e. b is a proper subset of a or b is in a

  21. Generative power 0. Type 0 (recursively enumerable) languages • only restriction on rules: left-hand side cannot be the empty string (* Ø  …….) - is the most powerful system 3. Type 3(regularlanguage) - is the least powerful

  22. Superset/subset relation S1 S2 a c b d f g a b S1 is a subset of S2 ; S2 is a superset of S1

  23. Rule Type – 3  Name: Regular  Example:Finite State Automata (Markov-process Grammar) Rule type: a) right-linear AxB or A  x with: A, B = auxiliary nodes and x = terminal node b) or left-linear ABx or A  x Generates: ambn with m,n  1 Cannot guarantee that there are as many a’s as b’s; no embedding

  24. A regular grammar for natural language sentences S →the A A → cat B A → mouse B A → duck B B → bites C B → sees C B → eats C C → the D D → boy D → girl D → monkey the cat bites the boy the mouse eats the monkey the duck sees the girl

  25. Regular grammars Grammar 1: Grammar 2: A → a A → a A → a B A → B a B → b A B → A b Grammar 3: Grammar 4: A → a A → a A → a B A → B a B → b B → b B → b A B → A b Grammar 5: Grammar 6: S → a AA → A a S → b B A → B a A → a S B → b B → b b S B → A b S →  A → a

  26. Grammars: non-regular Grammar 6: Grammar 7: S → A B A → a S → b B A → B a A → a S B → b B → b b S B → b A S → 

  27. Finite-State Automaton article noun NP NP1 NP2 adjective

  28. NP article NP1 adjective NP1 noun NP2 NP → article NP1 NP1 →adjective NP1 NP1 → noun NP2

  29. A parse tree S root node NP VP non- terminal N V NP nodes DET N terminal nodes

  30. Rule Type – 2 Name: Context Free Example: Phrase Structure Grammars/ Push-Down Automata Rule type: A with: A = auxiliary node  = any number of terminal or auxiliary nodes Recursiveness(centre embedding) allowed: AA

  31. CF Grammar  A Context Free grammar consists of: a) a finite terminal vocabulary VT b) a finite auxiliary vocabulary VA c) an axiom S  VA • a finite number of context free rules of form A → γ, where A  VA and γ  {VA VT}* In natural language syntax S is interpreted as the start symbol for sentence, as in S → NP VP

  32. Natural language Is English regular or CF? If centre embedding is required, then it cannot be regular Centre Embedding: 1. [The cat] [likes tuna fish] a b 2. The cat the dog chased likes tuna fish a a b b 3. The cat the dog the rat bit chased likes tuna fish a a a bb b 4. The cat the dog the rat the elephant admired bit chased likes tuna fish a a a a b b b b  ab aabb aaabbb aaaabbbb

  33. [The cat] [likes tuna fish] a b 2. [The cat] [the dog] [chased] [likes ...] aa bb

  34. Centre embedding S NP VP the likes cat tuna a b = ab

  35. S NP VP likes NP S tuna the b cat NP VP a thechased dogb a = aabb

  36. S   NP VP likes NP Stuna the b cat NPVP a chased NPSb the dog NPVP athebit ratb a = aaabbb

  37. Natural language 2 More Centre Embedding: 1. If S1, then S2 a a 2. Either S3, or S4 b b Sentence with embedding: If either the man is arriving today or the woman is arriving tomorrow, then the child is arriving the day after. a = [if b = [either the man is arriving today] b = [or the woman is arriving tomorrow]] a = [then the child is arriving the day after] = abba

  38. CS languages The following languages cannot be generated by a CF grammar (by pumping lemma): anbmcndm Swiss German: A string of dative nouns (e.g. aa), followed by a string of accusative nouns (e.g. bbb), followed by a string of dative-taking verbs (cc), followed by a string of accusative-taking verbs (ddd) = aabbbccddd = anbmcndm

  39. Swiss German: Jan sait das (Jan says that) … merem Hans esHuushälfedaastriiche we Hans/DAT the house/ACC helpedpaint we helped Hans paint the house abcd NPdatNPdatNPaccNPaccVdatVdatVaccVacc a a b b c c d d

  40. Context Free Grammars (CFGs) Sets of rules expressing how symbols of the language fit together, e.g.S -> NP VPNP -> Det NDet -> theN -> dog

  41. What Does Context Free Mean? • LHS of rule is just one symbol. • Can haveNP -> Det N • Cannot haveX NP Y -> X Det N Y

  42. Grammar Symbols • Non Terminal Symbols • Terminal Symbols • Words • Preterminals

  43. Non Terminal Symbols • Symbols which have definitions • Symbols which appear on the LHS of rulesS-> NP VPNP -> Det NDet -> theN-> dog

  44. Non Terminal Symbols • Same Non Terminals can have several definitionsS-> NP VPNP -> Det N NP -> N Det -> theN-> dog

  45. TerminalSymbols • Symbols which appear in final string • Correspond to words • Are not defined by the grammar S -> NP VPNP -> Det NDet -> theN -> dog

  46. Parts of Speech (POS) • NT Symbols which produce terminal symbols are sometimes called pre-terminals S -> NP VPNP -> Det NDet -> theN-> dog • Sometimes we are interested in the shape of sentences formed from pre-terminalsDet N VAux N V D N

  47. CFG - formal definition A CFG is a tuple (N,,R,S) • N is a set of non-terminal symbols •  is a set of terminal symbols disjoint from N • R is a set of rules each of the form A   where A is non-terminal • S is a designated start-symbol

  48. grammar: S  NP VP NP  N VP  V NP lexicon: V  kicks N  John N  Bill N = {S, NP, VP, N, V}  = {kicks, John, Bill} R = (see opposite) S = “S” CFG - Example

  49. Exercise • Write grammars that generate the following languages, for m > 0 (ab)m anbm anbn • Which of these are Regular? • Which of these are Context Free?

  50. (ab)m for m > 0 S -> a b S -> a b S

More Related