Share and share alike resources for language generation l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Share and Share Alike: Resources for Language Generation PowerPoint PPT Presentation


Share and Share Alike: Resources for Language Generation. Prof. Marilyn Walker University of Sheffield NSF- 20 April 2007. What type of resource is needed for generation?. What type of scientific problem is generation?

Download Presentation

Share and Share Alike: Resources for Language Generation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Share and Share Alike: Resources for Language Generation

Prof. Marilyn Walker

University of Sheffield NSF- 20 April 2007


What type of resource is needed for generation?

  • What type of scientific problem is generation?

  • An essential difference between language generation and language interpretation problems (parsing, WSD, relation extraction, coreference) is that there is no single right answer for language generation;

  • Language Productivity Assumption: An optimal generation resource will represent multiple outputs for each input, with a human-generated quality metric associated with each output


Dialogue vs. generation?

  • Dialogue is like generation in that there is no single right answer for how to do a task in dialogue;

  • Information gathering and information presentation in dialogue systems are generation problems;

  • DARPA evaluation for dialogue systems;

  • Fixed domain “TRAVEL PLANNING”

  • First: ATIS evaluations compared dialogue system behaviour against human behaviour in corpus of human-wizard dialogues (Hirschman 2000);

  • No “mixed initiative”, different dialogue strategies, divergence of context, user modeling;


Dialogue vs. generation?

  • Second: define context, evaluate on system response to user utterance in a particular context;

  • Much more like generation, context is defined, system ‘communicative goal’ is defined

  • Form: How is ‘the same response’ defined? Some forms for identical content may be better than others;

  • Content: User Models, definitions of context. Also dialogue system should be able to decide on communicative goal.


Dialogue vs. generation?

  • Third: Communicator evaluation: given user task (NYC to LHR, Continental, April 22nd, 2007), collect metrics (time to completion, ASR error, utterance output quality, concept understanding, user satisfaction);

  • Corpus semi-automatically labelled with dialogue act (quality/strategy metrics) for system utterances (8 or more different instantiations from different systems for particular communicative goals);

  • Try to understand which metrics are contributors to user satisfaction (PARADISE);

  • User utterance labelled subsequently, used in RL experiments comparing dialogue strategies;

  • Hard to compare particular scientific techniques for particular modules in systems, plug and play never worked


Dialogue vs. generation: Conclusions?

  • Just having a fixed task (TRAVEL) by itself does not necessarily lead to scientific progress;

  • Want to compare particular scientific techniques for particular modules in systems;

  • Plug and play is the only way to do this;

  • BUT: very hard to define for a whole community what interfaces between modules should be


Position

  • What type of resources would be useful for scientific advancement in language generation??

  • Almost anything!!

  • “If you build it they will come” - “If its useful people will use it”

  • Can we leverage what we already have in our own research groups, share it, and make it better?


What is needed to incentivize data sharing

  • Many different domains/problems/modules => NEED LOTS OF DIFFERENT RESOURCES;

  • Resources costly (developing group not ‘finished’ yet) => FINANCIAL INCENTIVE; SCIENTIFIC INCENTIVE; CITATION INCENTIVE;

  • Costs too much to support resource preparation, maintenance, distribution and re-use => NSF/LDC FINANCIAL/SUPPORT

  • NOTE: MANY LDC RESOURCES ARE ``FOUND DATA’’ (not explicitly commissioned)


A proposal for one shared resource


Information presentation of one or more database entities

  • Natural Language Interfaces/SDS (McKeown85, McCoy89, Cooperative Response literature, Carenini&Moore01, Polifroni etal 03, COGENTEX w/ active buyers website, Walkeretal04,Demberg&Moore06, etc)

  • Different communicative goals; Summarize, Recommend, Compare, Describe (DB entities)

  • Representation not controversial (attributes and values for DB entities, relations between entity and attribute)

  • Application not dependent on NLU


What type of resource is needed for generation?

  • What type of scientific problem is generation?

  • An essential difference between language generation and language interpretation problems (parsing, WSD, relation extraction, coreference) is that there is no single right answer for language generation;

  • Language Productivity Assumption: An optimal generation resource will represent multiple outputs for each input, with a human-generated quality metric associated with each output


We could make available a resource of:

  • INPUT-1: Speech ACT, SET of DB Entities

    • SUMMARIZE(SET); DESCRIBE(ENTITY), RECOMMEND(ENTITY,SET), COMPARE(SET)

  • INPUT-2: user model, discourse/dialogue context, style parameters, etc.

  • OUTPUT-1: a set of alternative outputs possibly with TTS markup

  • OUTPUT-2: human generated ratings or rankings for the outputs oriented to the criteria specified by INPUT-2


A Content Plan for a Recommend

  • strategy: recommend

  • relations: justify(nuc1; sat:2);

    justify(nuc:1; sat:3);

    justify(nuc:1, sat:4)

  • content: 1. assert(best (Babbo))

    2. assert(has-att (Babbo, foodquality(superb)))

    3. assert(has-att (Babbo, decor(excellent)))

    4. assert(has-att (Babbo, service(excellent)))


Human Feedback for Ranking

  • The ratings can represent any metric associated with the possible response, e.g. coherence, information quality, social appropriateness, personality.

  • Informational Coherence

    • SPARKY, a generator for MATCH

    • SPOT, a generator for AT&T COMMUNICATOR

  • Users are shown response variants then told:

    • For each variant, please rate to what extent you agree with this statement.

    • The utterance is easy to understand, well-formed and appropriate to the dialogue context.


Examples: Learned Rules applied to test fold


Individual Differences (Sentence Planning Preferences)


Human Feedback for Ranking (2)

  • Ten Item Personality Inventory Questionnaire, (Gosling 2003)

    • PERSONAGE

  • Users are shown response variants then told:

    • For each variant, rate on a scale of 1 to 7 whether:

    • The speaker is quiet, reserved;

    • The speaker is enthusiastic;


Personality judgments: `Recommend Le Marais’


What else is out there?

  • Coconut corpus: referring expression generation, but add alternatives and ratings?

  • Boston directions corpus (NSF funded early 1990s)

  • Communicator corpus (8 different system outputs for dialogue contexts that can be characterized)

  • Tools: Halogen, Penman, FUF-SURGE, RealPro

  • Library of text plans, content plans, sentence planners?


  • Login