Learning Language from its Perceptual Context

Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit Kate Yuk Wah Wong

Current State of Natural Language Learning • Most current state-of-the-art NLP systems are constructed by training on large supervised corpora. • Syntactic Parsing: Penn Treebank • Word Sense Disambiguation: SenseEval • Semantic Role Labeling: Propbank • Machine Translation: Hansards corpus • Constructing such annotated corpora is difficult, expensive, and time consuming.

Semantic Parsing • A semantic parser maps a natural-language sentence to a complete, detailed semantic representation: logical form ormeaning representation (MR). • For many applications, the desired output is immediately executable by another program. • Two application domains: • GeoQuery: A Database Query Application • CLang: RoboCup Coach Language

GeoQuery: A Database Query Application • Query application for U.S. geography database [Zelle & Mooney, 1996] How many states does the Mississippi run through? User Semantic Parsing answer(A, count(B, (state(B), C=riverid(mississippi), traverse(C,B)), A)) Query DataBase 10

CLang: RoboCup Coach Language • In RoboCup Coach competition teams compete to coach simulated soccer players • The coaching instructions are given in a formal language called CLang If the ball is in our penalty area, then all our players except player 4 should stay in our half. Simulated soccer field Coach Semantic Parsing ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) CLang

Semantic-Parser Learner Semantic Parser Natural Language Learning Semantic Parsers • Manually programming robust semantic parsers is difficult due to the complexity of the task. • Semantic parsers can be learned automatically from sentences paired with their logical form. NLMR Training Exs Meaning Rep

Our Semantic-Parser Learners • CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney, 1999, 2003) • Separates parser-learning and semantic-lexicon learning. • Learns a deterministic parser using ILP techniques. • COCKTAIL(Tang & Mooney, 2001) • Improved ILP algorithm for CHILL. • SILT (Kate, Wong & Mooney, 2005) • Learns symbolic transformation rules for mapping directly from NL to MR. • SCISSOR(Ge & Mooney, 2005) • Integrates semantic interpretation into Collins’ statistical syntactic parser. • WASP(Wong & Mooney, 2006; 2007) • Uses syntax-based statistical machine translation methods. • KRISP (Kate & Mooney, 2006) • Uses a series of SVM classifiers employing a string-kernel to iteratively build semantic representations.  

WASPA Machine Translation Approach to Semantic Parsing • Uses statistical machine translation techniques • Synchronous context-free grammars (SCFG) (Wu, 1997; Melamed, 2004; Chiang, 2005) • Word alignments (Brown et al., 1993; Och & Ney, 2003) • Hence the name: Word Alignment-based Semantic Parsing

A Unifying Framework for Parsing and Generation Natural Languages Machine translation

A Unifying Framework for Parsing and Generation Natural Languages Semantic parsing Machine translation Formal Languages

A Unifying Framework for Parsing and Generation Natural Languages Semantic parsing Machine translation Tactical generation Formal Languages

A Unifying Framework for Parsing and Generation Synchronous Parsing Natural Languages Semantic parsing Machine translation Tactical generation Formal Languages

A Unifying Framework for Parsing and Generation Synchronous Parsing Natural Languages Semantic parsing Compiling: Aho & Ullman (1972) Machine translation Tactical generation Formal Languages

Synchronous Context-Free Grammars (SCFG) • Developed by Aho & Ullman (1972) as a theory of compilers that combines syntax analysis and code generation in a single phase. • Generates a pair of strings in a single derivation.

Synchronous Context-Free GrammarProduction Rule Natural language Formal language QUERY What isCITY /answer(CITY)

What is CITY answer ( CITY ) the capital CITY capital ( CITY ) loc_2 ( STATE ) of STATE stateid ( 'ohio' ) Ohio CITY ofSTATE / loc_2(STATE) CITY the capitalCITY / capital(CITY) QUERY What isCITY / answer(CITY) Synchronous Context-Free Grammar Derivation QUERY QUERY answer(capital(loc_2(stateid('ohio')))) Whatis thecapital of Ohio STATE Ohio / stateid('ohio')

CITY capital CITY / capital(CITY) CITY of STATE / loc_2(STATE) Probabilistic Parsing Model d1 CITY CITY capital capital ( CITY ) CITY of loc_2 ( STATE ) STATE Ohio stateid ( 'ohio' ) STATE Ohio / stateid('ohio')

CITY capital CITY / capital(CITY) CITY of RIVER / loc_2(RIVER) Probabilistic Parsing Model d2 CITY CITY capital capital ( CITY ) CITY of loc_2 ( RIVER ) RIVER Ohio riverid ( 'ohio' ) RIVER Ohio / riverid('ohio')

CITY capital CITY / capital(CITY) CITY capital CITY / capital(CITY) CITY of STATE / loc_2(STATE) CITY of RIVER / loc_2(RIVER) + + Probabilistic Parsing Model d1 d2 CITY CITY capital ( CITY ) capital ( CITY ) loc_2 ( STATE ) loc_2 ( RIVER ) stateid ( 'ohio' ) riverid ( 'ohio' ) 0.5 0.5 λ λ 0.3 0.05 0.5 0.5 STATE Ohio / stateid('ohio') RIVER Ohio / riverid('ohio') Pr(d1|capital of Ohio) =exp( ) / Z 1.3 Pr(d2|capital of Ohio) = exp( ) / Z 1.05 normalization constant

Overview of WASP Unambiguous CFG of MRL Lexical acquisition Training set, {(e,f)} Lexicon,L (an SCFG) Parameter estimation SCFG parameterized by λ Training Testing Input sentence, e' Output MR, f' Semantic parsing

Tactical generation Tactical Generation • Can be seen as inverse of semantic parsing The goalie should always stay in our half Semantic parsing ((true) (do our {1} (pos (half our))))

Output Input Generation by Inverting WASP • Same synchronous grammar is used for both generation and semantic parsing. Tactical generation: Semantic parsing: NL: MRL: QUERY What isCITY /answer(CITY)

Learning Language from Perceptual Context Children do not learn language from annotated corpora. Neither do they learn language from just reading the newspaper, surfing the web, or listening to the radio. Unsupervised language learning DARPA Learning by Reading Program The natural way to learn language is to perceive language in the context of its use in the physical and social world. This requires inferring the meaning of utterances from their perceptual context. 23

Language Grounding The meanings of many words are grounded in our perception of the physical world: red, ball, cup, run, hit, fall, etc. Symbol Grounding: Harnad (1990) Even many abstract words and meanings are metaphorical abstractions of terms grounded in the physical world: up, down, over, in, etc. Lakoff and Johnson’s Metaphors We Live By Its difficult to put my words into ideas. Most NLP work represents meaning without any connection to perception; circularly defining the meanings of words in terms of other words or meaningless symbols with no firm foundation. 24

??? “Mary is on the phone”

Ambiguous Supervision for Learning Semantic Parsers • A computer system simultaneously exposed to perceptual contexts and natural language utterances should be able to learn the underlying language semantics. • We consider ambiguoustraining data of sentences associated with multiple potential MRs. • Siskind (1996) uses this type “referentially uncertain” training data to learn meanings of words. • Extracting meaning representations from perceptual data is a difficult unsolved problem. • Our system directly works with symbolic MRs.

??? “Mary is on the phone”

??? Ironing(Mommy, Shirt) “Mary is on the phone”

??? Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone”

??? Ironing(Mommy, Shirt) Carrying(Daddy, Bag) Working(Sister, Computer) “Mary is on the phone”

??? Ambiguous Training Example Ironing(Mommy, Shirt) Carrying(Daddy, Bag) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone”

Next Ambiguous Training Example Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) ??? Sitting(Mary, Chair) “Mommy is ironing a shirt”

Ambiguous Supervision for Learning Semantic Parsers (cont.) • Our model of ambiguous supervision corresponds to the type of data that will be gathered from a temporal sequence of perceptual contexts with occasional language commentary. • We assume each sentence has exactly one meaning in its perceptual context. • Recently extended to handle sentences with no meaning in its perceptual context. • Each meaning is associated with at most one sentence.

Sample Ambiguous Corpus gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog)) Forms a bipartite graph

KRISPER: KRISPwith EM-like Retraining • Extension of KRISP that learns from ambiguous supervision. • Uses an iterative EM-like self-training method to gradually converge on a correct meaning for each sentence.

KRISPER’s Training Algorithm 1. Assume every possible meaning for a sentence is correct gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

KRISPER’s Training Algorithm 2. Resulting NL-MR pairs are weighted and given to KRISP gave(daisy, clock, mouse) 1/2 ate(mouse, orange) Daisy gave the clock to the mouse. 1/2 ate(dog, apple) 1/4 1/4 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 1/4 1/4 broke(dog, box) 1/5 1/5 1/5 The dog broke the box. gave(woman, toy, mouse) 1/5 1/5 gave(john, bag, mouse) 1/3 1/3 John gave the bag to the mouse. threw(dog, ball) 1/3 1/3 runs(dog) 1/3 The dog threw the ball. 1/3 saw(john, walks(man, dog))

0.92 0.11 0.32 0.88 0.22 0.24 0.71 0.18 0.85 0.14 0.95 0.24 0.89 0.33 0.97 0.81 0.34 KRISPER’s Training Algorithm 3. Estimate the confidence of each NL-MR pair using the resulting trained parser gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

KRISPER’s Training Algorithm 4. Use maximumweightedmatching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957] gave(daisy, clock, mouse) 0.92 ate(mouse, orange) Daisy gave the clock to the mouse. 0.11 ate(dog, apple) 0.32 0.88 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 0.22 0.24 broke(dog, box) 0.71 0.18 0.85 The dog broke the box. 0.14 gave(woman, toy, mouse) 0.95 gave(john, bag, mouse) 0.24 0.89 John gave the bag to the mouse. threw(dog, ball) 0.33 0.97 runs(dog) 0.81 The dog threw the ball. 0.34 saw(john, walks(man, dog))

KRISPER’s Training Algorithm 5. Give the best pairs to KRISP in the next iteration, and repeat until convergence gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

Results on Ambig-ChildWorld Corpus

New Challenge:Learning to Be a Sportscaster • Goal: Learn from realistic data of natural language used in a representative context while avoiding difficult issues in computer perception (i.e. speech and vision). • Solution: Learn from textually annotated traces of activity in a simulated environment. • Example: Traces of games in the Robocup simulator paired with textual sportscaster commentary.

Simulated Perception Perceived Facts Grounded Language Learner Language Generator SCFG Semantic Parser Grounded Language Learning in Robocup Robocup Simulator Sportscaster Score!!!! Score!!!!

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation badPass ( Purple1, Pink8 ) turnover ( Purple1, Pink8 ) Purple goalie turns the ball over to Pink8 kick ( Pink8) pass ( Pink8, Pink11 ) Purple team is very sloppy today kick ( Pink11 ) Pink8 passes the ball to Pink11 Pink11 looks around for a teammate kick ( Pink11 ) ballstopped kick ( Pink11 ) Pink11 makes a long pass to Pink8 pass ( Pink11, Pink8 ) kick ( Pink8 ) pass ( Pink8, Pink11 ) Pink8 passes back to Pink11

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation P6 ( C1, C19 ) P5 ( C1, C19 ) Purple goalie turns the ball over to Pink8 P1( C19 ) P2 ( C19, C22 ) Purple team is very sloppy today P1 ( C22 ) Pink8 passes the ball to Pink11 Pink11 looks around for a teammate P1 ( C22 ) P0 P1 ( C22 ) Pink11 makes a long pass to Pink8 P2 ( C22, C19 ) P1 ( C19 ) P2 ( C19, C22 ) Pink8 passes back to Pink11

Learning Language from its Perceptual Context