The origin of syntactic bootstrapping: A computational model

The origin of syntactic bootstrapping: A computational model Michael Connor, Cynthia Fisher, & Dan Roth University of Illinois at Urbana-Champaign

How do children begin to understand sentences? Topid rivvo den marplox.

Two problems of ambiguity "the world" "the language" Topid rivvo den marplox. eating (sheep, food) feed (Johanna, sheep) help (Daddy, Johanna) scared (Johanna, sheep) are-nice (sheep) want (sheep, food) getting away (brother)

Typical view: knowledge of meaning drives knowledge of syntax "the world" "the sentence" Topid rivvo den marplox. feed (Johanna, sheep)

Typical view: knowledge of meaning drives knowledge of grammar "the world" "the language" Topid rivvo den marplox. Sentence-meaning pairs feed (Johanna, sheep) With or without assumed innate constraints on syntax- semantics linking (e.g. Pinker, 1984, 1989; Tomasello, 2003)

The problem of sentence and verb meanings • Predicate terms don't label events, but adopt different perspectives on them (e.g., Clark, 1990; Fillmore, 1977; Gillette, Gleitman, Gleitman, & Lederer., 1999; Rispoli, 1989; Slobin, 1985) "I'll put this here." versus "This goes here."

If we take seriously the ambiguity of scenes ... "the world" "the sentence" Topid rivvo den marplox. eating (sheep, food) feed (Johanna, sheep) help (Daddy, Johanna) scared (Johanna, sheep) are-nice (sheep) want (sheep, food) getting away (brother)

Syntactic bootstrapping(Gleitman, 1990; Gleitman et al., 2005; Landau & Gleitman, 1985; Naigles, 1990) verbs' semantic predicate-argument structure sentence structure "I'll putthis here." versus "Thisgoes here."

How could this be? How could aspects of syntactic structure guide early sentence interpretation ... before the child has learned much about the syntax and morphology of this language?

Three main claims: (1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press)

Three main claims: (1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press) (2) Early abstraction: Children are biased toward abstract representations of language experience. (Gertner, Fisher & Eisengart, 2006; Thothathiri & Snedeker, 2008) (3) Independent encoding of sentence structure: Children gather distributional facts about verbs from listening experience. (Arunachalam & Waxman, 2010; Scott & Fisher, 2009; Yuan & Fisher, 2009)

A fourth prediction: (4) 'Real' syntactic bootstrapping: Children can learn which words are verbs by tracking their syntactic argument-taking behavior in sentences.

A computational model of syntactic bootstrapping • Computational experiments simulate a learner whose knowledge of sentence structure is under our control • We equip the model with an unlearned bias to map nouns onto abstract semantic roles, and ask: • Are partial representations based on sets of nouns useful for learning more about the native language? • First case study: learning English word order • Could a learner identify verbs as verbs by noting their tendency to occur with particular numbers of arguments?

Computational Model of Semantic Role Labeling (SRL) • A semantic analysis of sentences at the level of who does what to whom • For each verb in the sentence: • SRL system tries to identify all constituents that fill a semantic role • and to assign them roles (agent, patient, goal, etc.)

Semantic Role Labeling (SRL): The basic idea External Labeled Feedback I is an agent A0 bread is a patient A1 Argument Classification: 5 A1 A1 ... ... A0 A0 Feature Extraction: 4 arg=I arg= bread verb= like verb= like NP < S NP < VP Predicate Identification: 3 Argument Identification: 2 Parsing: 1 I like dark bread .

The nature of the semantic roles • A key assumption of the SRL is that the semantic roles are abstract: • This is a property of the PropBank annotation scheme: verb-specific roles are grouped into macro-roles (Palmer, Gildea, & Kinsbury, 2005; cf. Dowty, 1991) Give: Arg0:Giver Arg1:Thing given Arg2:Recipient Have: Arg0:Owner Arg1:Possession Like: Arg0:Liker Arg1:Object ofaffection

The nature of the semantic roles • A key assumption of the SRL is that the semantic roles are abstract: • This is a result of the PropBank annotation scheme: verb specific roles are grouped into macro-roles (e.g., Dowty, 1991) Agent Give: Arg0:Giver Arg1:Thing given Arg2:Recipient Have: Arg0:Owner Arg1:Possession Like: Arg0:Liker Arg1:Object ofaffection

The nature of the semantic roles • A key assumption of the SRL is that the semantic roles are abstract: • This is a result of the PropBank annotation scheme: verb specific roles are grouped into macro-roles (e.g., Dowty, 1991) • So, like children, an SRL learns to identify agents and patients in sentences, not givers and things-given. Patient/Theme Give: Arg0:Giver Arg1:Thing given Arg2:Recipient Have: Arg0:Owner Arg1:Possession Like: Arg0:Liker Arg1:Object ofaffection

This SRL knows the grammar ... & reads minds External Labeled Feedback I is an agent A0 bread is a patient A1 Argument Classification: 5 A1 A1 ... ... A0 A0 Feature Extraction: 4 arg=I arg= bread verb= like verb= like NP < S NP < VP Predicate Identification: 3 Argument Identification: 2 Parsing: 1 I like dark bread .

This SRL knows the grammar ... & reads minds External Labeled Feedback what are they talking about? Argument Classification: 5 A1 A1 ... ... A0 A0 Feature Extraction: 4 arg=? arg=? verb= ??? verb= ??? ?? < ? ?? < ? Predicate Identification: 3 Argument Identification: 2 Parsing: 1 Topid rivvo den marplox.

Baby SRL External Labeled Feedback I is an agent A0 bread is a patient A1 Argument Classification: 5 A1 A1 ... ... A0 A0 Feature Extraction: 4 arg=I arg= bread verb= like verb= like 1st of two 2nd of two before verb after verb 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N Predicate Identification: 3 V 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Argument Identification: Seed Nouns 2 N N HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Baby SRL Internal Animacy Feedback bread is unknown,agent is I not A0 I is animate A0 Argument Classification: 5 A1 A1 ... ... A0 A0 Feature Extraction: 4 arg=I arg= bread verb= like verb= like 1st of two 2nd of two before verb after verb 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N Predicate Identification: 3 V 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Argument Identification: Seed Nouns 2 N N HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Unsupervised Part-of-Speech Clustering using a Hidden Markov Model (HMM) • Yields context-sensitive clustering of words based on sequential dependencies (cf. Bannard et al., 2004; Chang et al., 2006; Mintz, 2003; Solan et al., 2005) • A priori split between content and function words based on phonological differences (e.g., Shi et al., 1998; Connor et al., 2010) • 80 hidden states, 30 pre-allocated to function words • Trained on ~ 1 million words of unlabeled text from CHILDES 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Linking rules • But these clusters are unlabeled • How do we link unlabeled clusters with particular grammatical categories, thus particular roles in sentence interpretation (SRL)? • nouns, which are candidate arguments • and verbs, which take arguments 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Linking rules: Nouns first • Syntactic bootstrapping is grounded in noun learning (e.g., Gillette et al., 1999) • Children learn the meanings of some nouns without syntactic knowledge • Each noun is assumed to be a candidate argument • A simple form of semantic bootstrapping (Pinker, 1984) • (10 to 75) Seed Nouns sampled from M-CDI 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Argument Identification: Seed Nouns 2 N N HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Linking rules: What about verbs? • Verbs take noun arguments • Via syntactic bootstrapping, children identify a word as a verb by tracking its argument-taking behavior in sentences Predicate Identification: 3 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Argument Identification: Seed Nouns 2 N N HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Linking rules: What about verbs? • Collect statistics over HMM training corpus: how often each state occurs with a particular number of arguments • For each SRL input sentence, choose the word whose state is most likely to appear with the number of arguments found in the current sentence. 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N Predicate Identification: 3 V 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Argument Identification: Seed Nouns 2 N N HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Baby SRL Argument Classification: 5 A1 A1 ... ... A0 A0 Feature Extraction: 4 arg=I arg= bread verb= like verb= like 2nd of two 1st of two after verb before verb This is a partial sentence representation. 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N Predicate Identification: 3 V 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Argument Identification: Seed Nouns 2 N N HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Baby SRL External Labeled Feedback I is an agent A0 bread is a patient A1 Argument Classification: 5 A1 A1 ... ... A0 A0 Feature Extraction: 4 arg=I arg= bread verb= like verb= like 1st of two 2nd of two after verb before verb This is a partial sentence representation. 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N Predicate Identification: 3 V 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Argument Identification: Seed Nouns 2 N N HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

[Johanna rivvo den sheep.] Semantic-role feedback??? • Theories of language acquisition assume learning to understand sentences is a partially-supervised task. • use existing knowledge of words & syntax to assign a meaning to a sentence • the appropriateness of the meaning in context then provides the feedback

Baby SRL with Animacy-based Feedback Internal Animacy Feedback bread is unknown,agent is I not A0 I is animate A0 Bootstrapped animacy feedback (a list of concrete nouns): • Animates are agents; inanimates are non-agents • Each verb has at most one agent • If there are two known animate nouns • Use current SRL classifier to choose the best agent (and use this decision as feedback to train the SRL) • Note: Animacy feedback trains a simplified classifier, with only an agent/non-agent role distinction

Baby SRL Internal Animacy Feedback bread is unknown,agent is I not A0 I is animate A0 Argument Classification: 5 notA0 notA0 A0 A0 Feature Extraction: 4 arg=I arg= bread verb= like verb= like 1st of two 2nd of two before verb after verb 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N Predicate Identification: 3 V 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Argument Identification: Seed Nouns 2 N N HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Training the Baby SRL 3 corpora of child-directed speech • Annotated parental utterances using PropBank annotation scheme • Speech to Adam, Eve, Sarah (Brown, 1973) Adam 01-23 (2;3 - 3;2) • Train on 01-20: 3951 propositions, 8107 arguments Eve 01-20 (1;6 - 2;3) • Train on 01-18: 4029 propositions, 8499 arguments Sarah 01-90 (2;3 - 4;1) • Train on 01-83: 8570 propositions, 15599 arguments

Testing the Baby SRL • Constructed test sentences like those used in experiments with children • Unknown verbs & two animate nouns force the SRL to rely on syntactic knowledge "Adam krads Daddy!" Adam Mommy Daddy Ursula ... Adam Mommy Daddy Ursula ... krad

Testing the Baby SRL • Compare systems trained on different syntactic features derived from the partial sentence representation • Lexical baseline: Verb and argumentword features only • Noun Pattern: Lexical features plusfeatures such as '1st of 2 nouns', '2ndof 2 nouns' • Verb Position: Lexical features plus'before verb', 'after verb' • Combined: All feature types arg=I arg=I verb= like verb= like 1st of two before verb arg=I verb= like 1st of two arg=I verb= like before verb

Question 1: Are partial sentence representations useful, in principle, for learning new syntactic knowledge? Learning English word order Start with "gold standard" part-of-speech tagging and "gold standard" semantic role feedback

Are partial sentence representations useful, as a starting point for learning? • We assume that representations as simple as an ordered set of nouns can yield useful information about verb argument structure and sentence interpretation. • Is this true, in ordinary samples of language use? • Despite all the noise this simple representation will add to the data?

Result 1: Partial sentence representations are useful for further syntax learning (A krads B)

Question 2: Can partial sentence representations be built via unsupervised clustering ... and the proposed linking rules? 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N Predicate Identification: 3 V 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Argument Identification: Seed Nouns 2 N N HMM: Trained on CDS 1 58 78 50 48 1 I like dark bread .

Result 2: Partial sentence representations can be built via unsupervised clustering Number of known seed nouns

Question 3: Now for the Baby SRL Are partial sentence representations built via unsupervised clustering useful for learning new syntactic knowledge? With animacy-based feedback? Learning English word order Starting from scratch ...

Result 3: Partial sentence representations based on unsupervised clustering are useful

SRL Summary • Result 1: Partial sentence representations are useful for further syntax learning (gold arguments & feedback) • Result 2: Partial sentence representations can be built via unsupervised clustering, and the proposed linking rules: • Semantic bootstrapping for nouns, 'real' syntactic bootstrapping for verbs • Result 3: Partial sentence representations based on unsupervised clustering are useful for further syntax learning • Robust learning based on noisy unsupervised 'parse' of sentences, and noisy animacy-based feedback • verbs precede objects in English

Three main claims: (1) Structure-mapping: Syntactic bootstrapping begins with an unlearned bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (2) Early abstraction: Children are biased toward usefully abstract representations of language experience. (3) Independent encoding of sentence structure: Children gather distributional facts about new verbs from listening experience.

A fourth prediction: (4) 'Real' syntactic bootstrapping: Children learn which words are the verbs by tracking their syntactic argument-taking behavior in sentences Verb = Push [noun1, noun2] 2-participant relation Hey, she pushed her. Will you push me on the swing? John pushed the cat off the sofa ...

Yael Gertner Jesse Snedeker Lila Gleitman NSF NICHD Soondo Baek Kate Messenger Sylvia Yuan Rose Scott Acknowledgements

The origin of syntactic bootstrapping: A computational model

The origin of syntactic bootstrapping: A computational model

Presentation Transcript

Two SAS Bootstrapping Programs

Computational Models

Computational Models

Syntactic structure in familiar and exotic languages

What We Learned from a Computational Model of Job Attitudes and Stress

Bootstrapping your way to Success

CMSC 723 / LING 645: Intro to Computational Linguistics

Bootstrapping

Computational Spectro-temporal Auditory Model

Syntactic annotation in CGN: lessons learned and to be learned

EULAG: a computational model for multi-scale flows, an overview

Investigating syntactic movement in the brain

Computational lexicology, morphology and syntax Course 9 Diana Trandabăț 2013-2014

A Discriminative Syntactic Word Order Model for MT

Bootstrapping

Tracking L2 Lexical and Syntactic Development

MEGN 536 – Computational Biomechanics

Learning a joint model of word sense and syntactic preference

COMPUTATIONAL MODELING OF PRESSURE EFFECTS FROM HYDROGEN EXPLOSIONS