1 / 33

Models of Grammar Learning

Models of Grammar Learning. CS 182 Lecture April 24, 2008. What constitutes learning a language?. What are the sounds (Phonology) How to make words (Morphology) What do words mean (Semantics) How to put words together (Syntax) Social use of language (Pragmatics)

olympia
Download Presentation

Models of Grammar Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Models of Grammar Learning CS 182 Lecture April 24, 2008

  2. What constitutes learning a language? • What are the sounds (Phonology) • How to make words (Morphology) • What do words mean (Semantics) • How to put words together (Syntax) • Social use of language (Pragmatics) • Rules of conversations (Pragmatics)

  3. Language Learning Problem • Prior knowledge • Initial grammar G (set of ECG constructions) • Ontology (category relations) • Language comprehension model (analysis/resolution) • Hypothesis space: new ECG grammar G’ • Search = processes for proposing new constructions • Relational Mapping, Merge, Compose

  4. Language Learning Problem • Performance measure • Goal: Comprehension should improve with training • Criterion: need some objective function to guide learning… Probability of Model given Data: Minimum Description Length:

  5. Minimum Description Length • Choose grammar G to minimize cost(G|D): • cost(G|D) = α • size(G) + β • complexity(D|G) • Approximates Bayesian learning; cost(G|D) ≈ posterior probability P(G|D) • Size of grammar = size(G) ≈ prior P(G) • favor fewer/smaller constructions/roles; isomorphic mappings • Complexity of data given grammar ≈ likelihood P(D|G) • favor simpler analyses(fewer, more likely constructions) • based on derivation length + score of derivation

  6. Size Of Grammar • Size of the grammar G is the sum of the size of each construction: • Size of each construction c is: where • nc = number of constituents in c, • mc = number of constraints in c, • length(e) = slot chain length of element reference e

  7. What do we know about language development? (focusing mainly on first language acquisition of English-speaking, normal population)

  8. 0 mos 6 mos 12 mos 2 yr 3 yrs 4 yrs 5 yrs Children are amazing learners cooing first word reduplicated babbling two-word combinations multi-word utterances questions, complex sentence structures, conversational principles

  9. Phonology: Non-native contrasts • Werker and Tees (1984) • Thompson: velar vs. uvular, /`ki/-/`qi/. • Hindi: retroflex vs. dental, /t.a/-/ta/

  10. pretty baby Finding words: Statistical learning • Saffran, Aslin and Newport (1996) • /bidaku/, /padoti/, /golabu/ • /bidakupadotigolabubidaku/ • 2 minutes of this continuous speech stream • By 8 months infants detect the words (vs non-words and part-words)

  11. Hirsch-Pasek and Golinkoff (1996) 1;4-1;7 mostly still in the one-word stage Where is CM tickling BB? Word order: agent and patient

  12. Early syntax • agent + action ‘Daddy sit’ • action + object ‘drive car’ • agent + object ‘Mommy sock’ • action + location ‘sit chair’ • entity + location ‘toy floor’ • possessor + possessed ‘my teddy’ • entity + attribute ‘crayon big’ • demonstrative + entity ‘this telephone’

  13. From Single Words To Complex Utterances FATHER: Nomi are you climbing up the books? NAOMI: up. NAOMI: climbing. NAOMI: books. 1;11.3 FATHER: what’s the boy doing to the dog? NAOMI: squeezing his neck. NAOMI: and the dog climbed up the tree. NAOMI: now they’re both safe. NAOMI: but he can climb trees. 4;9.3 MOTHER: what are you doing? NAOMI: I climbing up. MOTHER: you’re climbing up? 2;0.18 Sachs corpus (CHILDES)

  14. Gold’s Theorem: No superfinite class of language is identifiable in the limit from positive data only Principles & Parameters Babies are born as blank slates but acquire language quickly (with noisy input and little correction) → Language must be innate: Universal Grammar + parameter setting But babies aren’t born as blank slates! And they do not learn language in a vacuum! How Can Children Be So Good At Learning Language?

  15. Modifications of Gold’s Result • (Weakly) Ordered Examples, implicit negatives • Loosened Identification Conditions • Complexity Measures, Best Fit No Theorems will resolve these issues

  16. Modeling the acquisition of grammar: Theoretical assumptions

  17. Language Acquisition • Opulence of the substrate • Prelinguistic children already have rich sensorimotor representations and sophisticated social knowledge • intention inference, reference resolution • language-specific event conceptualizations (Bloom 2000, Tomasello 1995, Bowerman & Choi, Slobin, et al.) • Children are sensitive to statistical information • Phonological transitional probabilities • Even dependencies between non-adjacent items (Saffran et al. 1996, Gomez 2002)

  18. throw frisbee get ball this should be reminiscent of your model merging assignment throw ball get bottle … … get OBJECT throw OBJECT Language Acquisition • Basic Scenes • Simple clause constructions are associated directly with scenes basic to human experience (Goldberg 1995, Slobin 1985) • Verb Island Hypothesis • Children learn their earliest constructions (arguments, syntactic marking) on a verb-specific basis (Tomasello 1992)

  19. Comprehensionispartial. (not just for dogs)

  20. What children pick up from what they hear • Children use rich situational context / cues to fill in the gaps • They also have at their disposal embodied knowledge and statistical correlations (i.e. experience) what did you throw it into? they’re throwing this in here. they’re throwing a ball. don’t throw it Nomi. well you really shouldn’t throw things Nomi you know. remember how we told you you shouldn’t throw things. what did you throw it into? they’re throwing this inhere. they’re throwing a ball. don’t throw it Nomi. wellyou really shouldn’t throw things Nomi you know. remember how we told you you shouldn’t throw things.

  21. Language Learning Hypothesis Children learn constructionsthat bridge the gap between what they know from language and what they know from the rest of cognition

  22. Modeling the acquisition of (early) grammar: Comprehension-driven, usage-based

  23. Natural Language Processing at Berkeley Dan Klein EECS Department UC Berkeley

  24. NLP: Motivation • It’d be great if machines could • Read text and understand it • Translate languages accurately • Help us manage, summarize, and aggregate information • Use speech as a UI • Talk to us / listen to us • But they can’t • Language is complex • Language is ambiguous • Language is highly structured

  25. Machine Translation • Syntactic MT • Learn grammar mappings between languages • Fully data-driven

  26. Information Extraction • Unsupervised Coreference Resolution • Take in lots of text • Learn what the entities are and how they corefer • Fully unsupervised, but gets supervised performance! • General research goal: unsupervised learning of meaning

  27. Syntactic Learning • Grammar Induction • Raw text in • Learned grammars out • Big result: this can be done! • Grammar Refinement • Coarse grammars in • Detailed grammars out • Gives top parsing systems

  28. Syntactic Inference • Natural language is very ambiguous • Grammars are huge • Billions of parses to consider • Milliseconds to do it Influental members of the House Ways and Means Committee introduced legislation that would restrict how the new S&L bailout agency can raise capital, creating another potential obstacle to the government's sale of sick thrifts.

  29. Xi Xj Xk Idea: Learn PCFGs with EM • Classic experiments on learning PCFGs with Expectation-Maximization [Lari and Young, 1990] • Full binary grammar over n symbols • Parse uniformly/randomly at first • Re-estimate rule expectations off of parses • Repeat • Their conclusion: it doesn’t really work. { X1 , X2 … Xn }

  30. Re-estimation of PCFGs • Basic quantity needed for re-estimation with EM: • Can calculate in cubic time with the Inside-Outside algorithm. • Consider an initial grammar where all productions have equal weight: • Then all trees have equal probability initially. • Therefore, after one round of EM, the posterior over trees will (in the absence of random perturbation) be approximately uniform over all trees, and symmetric over symbols.

  31. Problem: “Uniform” Posteriors Tree Uniform Split Uniform

  32. Overview: NLP at UCB • Lots of research and resources: • Dan Klein: Statistical NLP / ML • Marti Hearst: Stat NLP / HCI • Jerry Feldman: Language and Mind • Michael Jordan: Statistical Methods / ML • Tom Griffiths: Statistical Learning / Psychology • ICSI Speech and AI groups (Morgan, Stolcke, Shriberg, Fillmore, Kay, Narayanan…) • Great linguistics and stats departments! • No better place to solve the hard NLP problems!

  33. Other Approaches • Evaluation: fraction of nodes in gold trees correctly posited in proposed trees (unlabeled recall) • Some recent work in learning constituency: • [Adrians, 99] Language grammars aren’t general PCFGs • [Clark, 01] Mutual-information filters detect constituents, then an MDL-guided search assembles them • [van Zaanen, 00] Finds low edit-distance sentence pairs and extracts their differences

More Related