1 / 24

S EGUE : a Hybrid Case-Based Surface Natural Language Generator

S EGUE : a Hybrid Case-Based Surface Natural Language Generator. Shimei Pan and James Shaw IBM T.J. Watson Research Center. Overview of the Talk. Motivation Video demonstration System overview The hybrid algorithm Phase 1: Case-based retrieval Phase 2: Rule-based adaptation

dard
Download Presentation

S EGUE : a Hybrid Case-Based Surface Natural Language Generator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SEGUE: a Hybrid Case-Based Surface Natural Language Generator Shimei Pan and James Shaw IBM T.J. Watson Research Center

  2. Overview of the Talk • Motivation • Video demonstration • System overview • The hybrid algorithm • Phase 1: Case-based retrieval • Phase 2: Rule-based adaptation • Phase 3: Learning • Evaluation • Related Work • Conclusion

  3. Motivation • Small training corpus to enable reuse • High accuracy – conversational systems • Extensible - easy to increase coverage • Variety in output • Efficient in execution

  4. Video Demonstration

  5. SEGUE (Spoken English Generation Using Examples) • Hybrid • Case-based retrieval • for extensibility, variety, and speed • Rule-based adaptation • for reuse and high accuracy

  6. Match SEGUE Overview Sentence Corpus Input Target Semantic Graph 2 4 3 Adaptation numBedroom 2 House 1 1 5 Substitution Deletion Insertion 5 2 style 3 numBathroom 4 3 yearBuilt 4 5 3 2 1 This 3 bedroom, 2 bathroom colonial house… … TTS This 3 bedroom, 2 bathroom colonial house was built in 1890.

  7. This new home was sold for 500K. • This 3 bedroom, 2 bathroom colonial home was built in 1890. Output This 3 bedroom, 2 bathroom colonial home was built in 1890. Multi-level Adaptation Sentence Corpus Exact match Input No.bedroom: 3 No.bath: 2 Style: Colonial Year: 1890 Substitution • This new home was sold for 500K. • This 2 bedroom, 1 bathroom ranch home was built in 1990. Reule-based deletion and insertion • This apartment was built in 1983. • This 2 bedroom, 1 bathroom colonial home was sold for 500k.

  8. Overview of the Talk • Motivation • Video demonstration • System overview • The hybrid algorithm • Phase 1: Case-based retrieval • Phase 2: Rule-based adaptation • Phase 3: Learning • Evaluation • Related Work • Conclusion

  9. Phase One: Adaptation-Guided Retrieval • Given a new SemGraph, identify a ranked list of similar examples • Compute similarity measure • all-pairs comparison of propositions between SemGraphtarget and SemGraphCorpus • Create a list of adaptation operators • Substitute Cost: $ • Delete $$ • Insert $$$

  10. Ranking Retrieved Cases • SemGraph features – feature similarity • Speech act, theme/rheme etc. “Can you tell me about this colonial house?” ≠ “The style of this house is colonial.” • Adaptability (adaptation guided retrieval) • Operator cost – also captures semantic similarity • Sentence structure SemGraphtarget: The {1995} house is a {Colonial}. SemGraphCorpus: The {1995} {Colonial} house is {in Ardsley}. ReaTree: *The 1995 Colonial house.

  11. Phase Two: Rule-based Adaptation • Adaptation Operators • Substitute • Delete • Insert

  12. Substitute • Correct minor differences between SemGraphTarget and SemGraphCorpus • Examples: 2 houses → 1 housesgolf course → park

  13. Delete • Remove propositions not exist in SemGraphtarget • A reverse-aggregation process • Hypotactic • Remove modifying phrase structure by recursive traversal • Paratactic • Delete/Shift phrase structure and conjunctor (A,B,and C → A and B) • Adaptation-guided retrieval ensures the soundness of the main sentence structure

  14. Insert • Insert proposition in SemGraphtarget not exist in SemGraphCorpus • Incorporate phrases from various instances • Two types of aggregation operators • Paratactic • Hypotactic • Paratactic operators are applied first because they have more restrictive preconditions

  15. Insert (Paratactic) • SEGUE currently supports • Quantification “3 houses are Colonials. 1 house is a Tudor.” • Simple Conjunction “The names of the school districts are Lakeland School District and Panas School District.”

  16. Insert (Hypotactic) • Hypotactic Aggregation in SEGUE: two-step procedure • Extract all the phrases expressing the new proposition • Remember the heads they modify • Whether a premodifier or a post-modifier • Attach the substituted phrase to the head constituent being modified • Aggregation in traditional systems • Transform new proposition into a modifying constituent through a complex lexical process • Attach the transformed phrase to the head constituent being modified (the same as in SEGUE)

  17. Phase Three: Learning • Non-trivial SemGraphtarget and its adapted ReaTree are added to a temporary repository • After manual verification, “learned” cases are added to repository • SEGUE learns from past experience and becomes faster and more accurate over time. • Less chance to make mistakes because SEGUE does not always start from scratch for complex sentences

  18. Evaluation • Corpus • Assertive sentences related to houses • Houses has 20 main attributes, e.g., asking price, property tax, city location, school district • 100 sentences randomly selected from 21699 synthesized SemGraphs, containing 1 to 5 propositions. • Two judges manually evaluated each sentence

  19. Evaluation (2) • Result • Major grammatical/pragmatic errors involved • Multiple sentences being generated • The lack of a referring expression module “I found 3 Colonial houses with 0.2 acre of land. The house has an asphalt roof.” • Baseline: Direct template matching, with 210 sentences in training corpus

  20. Evaluation (3) • Major grammatical/pragmatic errors • Sentences that are too long • Incorrect referring expressions • Inadequate adaptation operators • ?A Colonial 1995 house • ?I found 2 2-bedroom houses ?The house in a city with population less than 4000 in a school district with over 95 percent of seniors attending college has 3 bedrooms and 3 bathrooms. *I found 3 houses with 3 bathrooms in cities with population less than 4000. The house has a crawlspace.

  21. Related Work • Rule-based NLG • Robin93,Lavoie97,Shaw98 • Statistical NLG • Langkilde&Knight98 • Bangalore&Rambow00 • Ratnaparkhi00 • Instance-based NLG • Varges&Mellish01 • Example-based Machine Translation (EBMT) • Brown99, Somers99, Somers01

  22. Significant Distinctions from other Corpus-based Approaches • Required corpus is much smaller, but more richly annotated • Ranking is performed first, not last • The adaptation operations always guarantee grammatical correctness • Do not always generate from scratch

  23. Conclusion • A case-based NL generator with high accuracy requiring only a small annotated corpus • The generated sentences are guaranteed to be grammatically correct by performing rule-based adaptation (in one sentence) • First to incorporate adaptation-guided retrieval in a case-based NLG system • Handles paraphrasing and idioms naturally • Performs faster and more accurate as solutions accumulates

  24. Thank you.

More Related