Learning Language from its Perceptual Context

Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Joohyun Kim Rohit Kate

Current State of Natural Language Learning • Most current state-of-the-art NLP systems are constructed by training on large supervised corpora. • Syntactic Parsing: Penn Treebank • Word Sense Disambiguation: SenseEval • Semantic Role Labeling: Propbank • Machine Translation: Hansards corpus • Constructing such annotated corpora is difficult, expensive, and time consuming.

Semantic Parsing • A semantic parser maps a natural-language (NL) sentence to a complete, detailed formal semantic representation: logical form ormeaning representation (MR). • For many applications, the desired output is computer language that is immediately executable by another program.

CLang: RoboCup Coach Language • In RoboCup Coach competition teams compete to coach simulated soccer players • The coaching instructions are given in a formal language called CLang If the ball is in our penalty area, then all our players except player 4 should stay in our half. Simulated soccer field Coach Semantic Parsing ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) CLang

Semantic-Parser Learner Semantic Parser Natural Language Learning Semantic Parsers • Manually programming robust semantic parsers is difficult due to the complexity of the task. • Semantic parsers can be learned automatically from sentences paired with their logical form. NLMR Training Exs Meaning Rep

Our Semantic-Parser Learners • CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney, 1999) • Separates parser-learning and semantic-lexicon learning. • Learns a deterministic parser using ILP techniques. • COCKTAIL(Tang & Mooney, 2001) • Improved ILP algorithm for CHILL. • SILT (Kate, Wong & Mooney, 2005) • Learns symbolic transformation rules for mapping directly from NL to MR. • SCISSOR(Ge & Mooney, 2005) • Integrates semantic interpretation into Collins’ statistical syntactic parser. • WASP(Wong & Mooney, 2006; 2007) • Uses syntax-based statistical machine translation methods. • KRISP (Kate & Mooney, 2006) • Uses a series of SVM classifiers employing a string-kernel to iteratively build semantic representations. • SynSem (Ge & Mooney, 2009) • Uses existing statistical syntactic parser & word alignment.  

Learning Language from Perceptual Context Children do not learn language from annotated corpora. Neither do they learn language from just reading the newspaper, surfing the web, or listening to the radio. Unsupervised language learning DARPA Learning by Reading Program The natural way to learn language is to perceive language in the context of its use in the physical and social world. This requires inferring the meaning of utterances from their perceptual context. 7

Language Grounding The meanings of many words are grounded in our perception of the physical world: red, ball, cup, run, hit, fall, etc. Symbol Grounding: Harnad (1990) Even many abstract words and meanings are metaphorical abstractions of terms grounded in the physical world: up, down, over, in, etc. Lakoff and Johnson’s Metaphors We Live By Its difficult to put my ideas into words. Most NLP work represents meaning without any connection to perception; circularly defining the meanings of words in terms of other words or meaningless symbols with no firm foundation. 8

Sample Circular Definitionsfrom WordNet • sleep (v) • “be asleep” • asleep (adj) • “in a state of sleep”

??? “Mary is on the phone”

??? Ironing(Mommy, Shirt) “Mary is on the phone”

??? Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone”

??? Ironing(Mommy, Shirt) Carrying(Daddy, Bag) Working(Sister, Computer) “Mary is on the phone”

??? Ambiguous Training Example Ironing(Mommy, Shirt) Carrying(Daddy, Bag) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone”

Next Ambiguous Training Example Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) ??? Sitting(Mary, Chair) “Mommy is ironing a shirt”

Ambiguous Supervision for Learning Semantic Parsers • Our model of ambiguous supervision corresponds to the type of data that will be gathered from a temporal sequence of perceptual contexts with occasional language commentary. • We assume each sentence has exactly one meaning in its perceptual context. • Recently extended to handle sentences with no meaning in its perceptual context. • Each meaning is associated with at most one sentence.

Sample Ambiguous Corpus gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog)) Forms a bipartite graph

KRISPER (Kate & Mooney, 2007): KRISPwith EM-like Retraining • Extension of KRISP that learns from ambiguous supervision. • Uses an iterative EM-like self-training method to gradually converge on a correct meaning for each sentence.

KRISPER’s Training Algorithm 1. Assume every possible meaning for a sentence is correct gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

KRISPER’s Training Algorithm 2. Resulting NL-MR pairs are weighted and given to KRISP gave(daisy, clock, mouse) 1/2 ate(mouse, orange) Daisy gave the clock to the mouse. 1/2 ate(dog, apple) 1/4 1/4 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 1/4 1/4 broke(dog, box) 1/5 1/5 1/5 The dog broke the box. gave(woman, toy, mouse) 1/5 1/5 gave(john, bag, mouse) 1/3 1/3 John gave the bag to the mouse. threw(dog, ball) 1/3 1/3 runs(dog) 1/3 The dog threw the ball. 1/3 saw(john, walks(man, dog))

0.92 0.11 0.32 0.88 0.22 0.24 0.71 0.18 0.85 0.14 0.95 0.24 0.89 0.33 0.97 0.81 0.34 KRISPER’s Training Algorithm 3. Estimate the confidence of each NL-MR pair using the resulting trained parser gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

KRISPER’s Training Algorithm 4. Use maximumweightedmatching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957] gave(daisy, clock, mouse) 0.92 ate(mouse, orange) Daisy gave the clock to the mouse. 0.11 ate(dog, apple) 0.32 0.88 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 0.22 0.24 broke(dog, box) 0.71 0.18 0.85 The dog broke the box. 0.14 gave(woman, toy, mouse) 0.95 gave(john, bag, mouse) 0.24 0.89 John gave the bag to the mouse. threw(dog, ball) 0.33 0.97 runs(dog) 0.81 The dog threw the ball. 0.34 saw(john, walks(man, dog))

KRISPER’s Training Algorithm 5. Give the best pairs to KRISP in the next iteration, and repeat until convergence gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

New Challenge:Learning to Be a Sportscaster • Goal: Learn from realistic data of natural language used in a representative context while avoiding difficult issues in computer perception (i.e. speech and vision). • Solution: Learn from textually annotated traces of activity in a simulated environment. • Example: Traces of games in the Robocup simulator paired with textual sportscaster commentary.

Tactical Generation • Learn how to generate NL from MR • Example: Pass(Pink2, Pink3) “Pink2 kicks the ball to Pink3”

WASP / WASP-1(Wong & Mooney, 2006, 2007) Supervised system for learning both a semantic parser and a tactical language generator. Uses probabilistic version of a synchronous context-free grammar (SCFG) that generates two corresponding strings (NL & MR) simultaneously.

Simulated Perception Perceived Facts Grounded Language Learner Language Generator SCFG Semantic Parser Grounded Language Learning in Robocup Robocup Simulator Sportscaster Score!!!! Score!!!!

Sample Human Sportscast in Korean

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation badPass ( Purple1, Pink8 ) turnover ( Purple1, Pink8 ) Purple goalie turns the ball over to Pink8 kick ( Pink8) pass ( Pink8, Pink11 ) Purple team is very sloppy today kick ( Pink11 ) Pink8 passes the ball to Pink11 Pink11 looks around for a teammate kick ( Pink11 ) ballstopped kick ( Pink11 ) Pink11 makes a long pass to Pink8 pass ( Pink11, Pink8 ) kick ( Pink8 ) pass ( Pink8, Pink11 ) Pink8 passes back to Pink11

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation P6 ( C1, C19 ) P5 ( C1, C19 ) Purple goalie turns the ball over to Pink8 P1( C19 ) P2 ( C19, C22 ) Purple team is very sloppy today P1 ( C22 ) Pink8 passes the ball to Pink11 Pink11 looks around for a teammate P1 ( C22 ) P0 P1 ( C22 ) Pink11 makes a long pass to Pink8 P2 ( C22, C19 ) P1 ( C19 ) P2 ( C19, C22 ) Pink8 passes back to Pink11

Strategic Generation • Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation). • For automated sportscasting, one must be able to effectively choose which events to describe.

Example of Strategic Generation pass ( purple7 , purple6 ) ballstopped kick ( purple6 ) pass ( purple6 , purple2 ) ballstopped kick ( purple2 ) pass ( purple2 , purple3 ) kick ( purple3 ) badPass ( purple3 , pink9 ) turnover ( purple3 , pink9 )

Example of Strategic Generation pass ( purple7 , purple6 ) ballstopped kick ( purple6) pass ( purple6 , purple2 ) ballstopped kick ( purple2) pass ( purple2 , purple3 ) kick ( purple3 ) badPass ( purple3 , pink9 ) turnover ( purple3 , pink9 )

Robocup Data • Collected human textual commentary for the 4 Robocup championship games from 2001-2004. • Avg # events/game = 2,613 • Avg # English sentences/game = 509 • Avg # Korean sentences/game = 499 • Each sentence matched to all events within previous 5 seconds. • Avg # MRs/sentence = 2.5 (min 1, max 12) • Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only).

WASPER WASP with EM-like retraining to handle ambiguous training data. Same augmentation as added to KRISP to create KRISPER.

KRISPER-WASP • First train KRISPER to disambiguate the data • Then train WASP on the resulting unambiguously supervised data.

WASPER-GEN Determines the best matching based on generation (MR→NL). Score each potential NL/MR pair by using the currently trained WASP-1 generator. Compute NIST MT score [NIST report, 2002] between the generated sentence and the potential matching sentence.

Strategic Generation Learning • For each event type (e.g. pass, kick) estimate the probability that it is described by the sportscaster. • Requires correct NL/MR matching • Use estimated matching from tactical generation • Iterative Generation Strategy Learning

Iterative Generation Strategy Learning (IGSL) • Estimates the likelihood of commenting on each event-type directly from the ambiguous training data. • Uses EM-like self-training iterations to compute estimates.

English Demo Game clip commentated using WASPER-GEN with IGSL strategic generation, since this gave the best results for generation. FreeTTS was used to synthesize speech from textual output.

Machine Sportscast in English

Experimental Evaluation • Generated learning curves by training on all combinations of 1 to 3 games and testing on all games not used for training. • Baselines: • Random Matching: WASP trained on random choice of possible MR for each comment. • Gold Matching: WASP trained on correct matching of MR for each comment. • Metrics: • Precision: % of system’s annotations that are correct • Recall: % of gold-standard annotations correctly produced • F-measure: Harmonic mean of precision and recall

Evaluating NL-MR Matching • How well does the learner figure out which event (if any) each sentence refers to? Natural Language Commentary Meaning Representation badPass ( Purple1, Pink8 ) turnover ( Purple1, Pink8 ) Purple goalie turns the ball over to Pink8 kick ( Pink8) pass ( Pink8, Pink11 ) Purple team is very sloppy today kick ( Pink11 ) Pink8 passes the ball to Pink11

Matching Results(F-Measure)

Learning Language from its Perceptual Context

Learning Language from its Perceptual Context

Presentation Transcript

Language in Context

Perceptual - Motor Skill Learning

Applying blogs to a language learning context

Language Rights and Language Context

Australian ITS Context

CHOREOGRAPHY IN ITS CONTEXT

Learning Natural Language from its Perceptual Context

Language in Context

Physical, Perceptual, and Language Development

From Theory of Language to Language Learning

Pragmatically-guided perceptual learning

PERCEPTUAL LEARNING AND CORTICAL SELF-ORGANIZATION

Australian ITS Context

LEARNING ANY NEW LANGUAGE FROM

Perceptual-Motor Learning

The Perceptual Context Categorizing and Stereotyping

Learning Language from its Perceptual Context