1 / 51

Learning to Interpret Natural Language Navigation Instructions from Observation

Learning to Interpret Natural Language Navigation Instructions from Observation. Ray Mooney Department of Computer Science University of Texas at Austin. Joint work with David Chen Joohyun Kim Lu Guo .

gzifa
Download Presentation

Learning to Interpret Natural Language Navigation Instructions from Observation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin Joint work with David Chen Joohyun Kim Lu Guo.........

  2. Challenge Problem:Learning to Follow Directions in a Virtual World • Learn to interpret navigation instructions in a virtual environment by simply observing humans giving and following such directions (Chen & Mooney, AAAI-11). • Eventual goal: Virtual agents in video games and educational software that automatically learn to take and give instructions in natural language.

  3. Sample Environment(MacMahon, et al. AAAI-06) H H – Hat Rack L – Lamp E – Easel S – Sofa B – Barstool C - Chair L E E C S S C B H L

  4. Sample Instructions • Take your first left. Go all the way down until you hit a dead end. • Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4. • Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4. • Walk forward once. Turn left. Walk forward twice. Start 3 End 4 H

  5. Sample Instructions • Take your first left. Go all the way down until you hit a dead end. • Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4. • Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4. • Walk forward once. Turn left. Walk forward twice. Start 3 End 4 H Observed primitive actions: Forward, Left, Forward, Forward

  6. Observed Training Instance in Chinese

  7. Executing Test Instance in English

  8. Formal Problem Definition Given: { (e1, w1 , a1), (e2, w2 , a2), … , (en, wn , an) } ei – A natural language instruction wi– A world state ai– An observed action sequence Goal: Build a system that produces the correct aj given a previously unseen (ej, wj).

  9. Observation World State Action Trace Instruction Training

  10. Learning system for parsing navigation instructions Observation World State Navigation Plan Constructor Action Trace Instruction Training

  11. Learning system for parsing navigation instructions Observation World State Navigation Plan Constructor Action Trace Instruction Semantic Parser Learner Training

  12. Learning system for parsing navigation instructions Observation World State Navigation Plan Constructor Action Trace Instruction Semantic Parser Learner Training Testing Instruction World State

  13. Learning system for parsing navigation instructions Observation World State Navigation Plan Constructor Action Trace Instruction Semantic Parser Learner Training Testing Instruction Semantic Parser World State

  14. Learning system for parsing navigation instructions Observation World State Navigation Plan Constructor Action Trace Instruction Semantic Parser Learner Training Testing Instruction Semantic Parser World State Execution Module (MARCO) Action Trace

  15. Representing Linguistic Context Turn Verify Travel Verify Context is represented by the sequence of observed actions each followed by verifying all observable aspects of the resulting world state. LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 39

  16. Possible Plans Turn Verify Travel Verify An instruction can refer to a combinatorial number of possible plans, each composed of some subset of this full contextual description. LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 39

  17. Possible Plan # 1 Turn Verify Travel Verify Turn and walk to the couch LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 40

  18. Possible Plan # 2 Turn Verify Travel Verify Face the blue hall and walk 2 steps LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 41

  19. Possible Plan # 3 Turn Verify Travel Verify Turn left. Walk forward twice. LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 42

  20. Disambiguating Sentence Meaning • Too many meanings to tractably enumerate them all. • Therefore, cannot use EM to align sentences with enumerated meanings and thereby disambiguate the training data. 43

  21. Learning system for parsing navigation instructions Observation World State Navigation Plan Constructor Action Trace Instruction Semantic Parser Learner Training Testing Instruction Semantic Parser World State Execution Module (MARCO) Action Trace

  22. Learning system for parsing navigation instructions Observation World State Context Extractor Action Trace Instruction Semantic Parser Learner Training Testing Instruction Semantic Parser World State Execution Module (MARCO) Action Trace

  23. Learning system for parsing navigation instructions Observation World State Context Extractor Action Trace Lexicon Learner Instruction Semantic Parser Learner Training Testing Instruction Semantic Parser World State Execution Module (MARCO) Action Trace

  24. Learning system for parsing navigation instructions Observation World State Context Extractor Action Trace Lexicon Learner Instruction Plan Refinement Semantic Parser Learner Training Testing Instruction Semantic Parser World State Execution Module (MARCO) Action Trace

  25. Lexicon Learning • Learn meanings of words and short phrases by finding correlations with meaning fragments. walk face Turn Verify Travel blue hall 2 steps steps: 2 front: BLUE HALL 43

  26. Lexicon Learning Algorithm To learn the meaning of the word/short phrase w: • Collect all landmark plans that co-occur with w and add them to the set PosMean(w) • Repeatedly take intersections of all possible pairs of members of PosMean(w) and add any new entries, g, to PosMean(w). • Rank the entries by the scoring function:

  27. Graph Intersection Graph 1: “Turn and walk to the sofa.” Turn Verify Travel Verify LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA Travel Verify Turn Verify Graph 2: “Walk to the sofa and turn left.” steps: 1 at: SOFA LEFT front: BLUE HALL

  28. Graph Intersection Graph 1: “Turn and walk to the sofa.” Turn Verify Travel Verify LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA Travel Verify Turn Verify Graph 2: “Walk to the sofa and turn left.” steps: 1 at: SOFA LEFT front: BLUE HALL Intersections: Turn Verify LEFT front: BLUE HALL

  29. Graph Intersection Graph 1: “Turn and walk to the sofa.” Turn Verify Travel Verify LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA Travel Verify Turn Verify Graph 2: “Walk to the sofa and turn left.” steps: 1 at: SOFA LEFT front: BLUE HALL Intersections: Turn Verify Travel Verify at: SOFA LEFT front: BLUE HALL

  30. Plan Refinement • Use learned lexicon to determine subset of context representing sentence meaning. Face the blue hall and walk 2 steps Turn Verify Travel Verify LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 43

  31. Plan Refinement • Use learned lexicon to determine subset of context representing sentence meaning. Face the blue hall and walk 2 steps Turn Verify Travel Verify LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 43

  32. Plan Refinement • Use learned lexicon to determine subset of context representing sentence meaning. Face the blue hall and walk 2 steps Turn Verify Travel Verify LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 43

  33. Plan Refinement • Use learned lexicon to determine subset of context representing sentence meaning. Face the blue hall and walk 2 steps Turn Verify Travel Verify LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 43

  34. Plan Refinement • Use learned lexicon to determine subset of context representing sentence meaning. Face the blue hall and walk 2 steps Turn Verify Travel Verify LEFT front: BLUE HALL front: SOFA steps: 2 at: SOFA 43

  35. Evaluation Data Statistics • 3 maps, 6 instructors, 1-15 followers/direction • Hand-segmented into single sentence steps

  36. End-to-End Execution Evaluation • Test how well the system follows novel directions. • Leave-one-map-out cross-validation. • Strict metric: Only correct if the final position exactly matches goal location. • Lower baselines: • Simple probabilistic generative model of executed plans w/o language. • Semantic parser trained on full context plans • Upper baselines: • Semantic parser trained on human annotated plans • Human followers

  37. End-to-End Execution Accuracy

  38. Sample Successful Parse

  39. Mandarin Chinese Experiment • Translated all the instructions from English to Chinese. 64

  40. Problem with Purely Correlational Lexicon Learning • The correlation between an n-gram w and graph g can be affected by the context. • Example: • Bigram: ”the wall” • Sample uses: • ”turn so the wall is on your right side” • ”with your back to the wall turn left” • Co-occurring aspects of context • TURN() • VERIFY(direction: WALL) • But “the wall” is simply an object involving no action

  41. Syntactic Bootstrapping • Children sometimes use syntactic information to guide learning of word meanings (Gleitman, 1990). • Complement to Pinker’ssemantic bootstrapping in which semantics is used to help learn syntax.

  42. Using POS to Aid Lexicon Learning • Annotate each n-gram, w, with POS tags. • dead/JJ end/NN • Annotate each node in meaning graph, g, with a semantic-category tag. • TURN/Action VERIFY/Action FORWARD/Action Reason: “dead end” is often followed by the action of turning around to face another direction so that there is a way to go forward

  43. Constraints on Lexicon Entry: (w,g) • The n-gram w should contain a noun if and only if the graph g contains an Object • The n-gram w should contain a verbif and only if the graph g contains an Action dead/JJ end/NN TURN/Action VERIFT/Action FORWARD/Action dead/JJ end/NN Front/Relation WALL/Object Retain it. Violates the Rules! Remove it.

  44. Experimental Results

  45. PCFG Induction Model for Grounded Language Learning (Borschinger et al. 2011) • PCFG rules to describe generative process from MR components to corresponding NL words

  46. Series of Grounded Language Learning Papers that Build Upon Each Other • Kate & Mooney, AAAI-07 • Chen & Mooney, ICML-08 • Liang, Jordan, and Klein, ACL-09 • Kim & Mooney, COLING-10 • Also integrates Lu, Ng, Lee, & Zettlemoyer, EMNLP-08 • Borschinger, Jones, & Johnson, EMNLP-11 • Kim & Mooney, EMNLP-12

  47. PCFG Induction Model for Grounded Language Learning (Borschinger et al. 2011) • Generative process • Select complete MR to describe • Generate atomic MR constituents in order • Each atomic MR generates NL words by unigram Markov process • Parameters learned using EM (Inside-Outside) • Parse new NL sentences by reading top MR nonterminal from most probable parse tree • Output MRs only included in PCFG rule set constructed from training data

  48. Limitations of Borschinger et al. 2011PCFG Approach • Only works in low ambiguity settings. • Where each sentence can refer to only a few possible MRs. • Only output MRs explicitly included in the PCFG constructed from the training data • Produces intractably large PCFGs for complex MRs with high ambiguity. • Would require ~1018 productions for our navigation data.

  49. Our Enhanced PCFG Model(Kim & Mooney, EMNLP-2012) • Use learned semantic lexicon to constrain the constructed PCFG. • Limit each MR to generate only words and phrases paired with this MR in the lexicon. • Only ~18,000 productions produced for the navigation data, compared to ~33,000 produced by Borschinger et al. for far simpler Robocup data. • Output novel MRs not appearing in the PCFG by composing subgraphs from the overall context.

  50. End-to-End Execution Evaluations

More Related