1 / 13

Question Generation: Proposed Challenge Tasks and Their Evaluation

Question Generation: Proposed Challenge Tasks and Their Evaluation. Rodney D. Nielsen Boulder Language Technologies, Boulder, CO Center for Computational Language and Education Research, CU, Boulder. The Nature of Automatic QG. Application Dependent Educational Assessment Evaluate

Download Presentation

Question Generation: Proposed Challenge Tasks and Their Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Question Generation: Proposed Challenge Tasks and Their Evaluation Rodney D. Nielsen Boulder Language Technologies, Boulder, CO Center for Computational Language and Education Research, CU, Boulder

  2. The Nature of Automatic QG • Application Dependent • Educational Assessment • Evaluate • Socratic Tutoring • Guide • Etc. • Gather information

  3. Defining the QG Tasks • QG can be viewed as a 3-step process • Concept Selection • Question Type Determination • Question Construction

  4. Key Concept Identification • Givens: • The full text document • The application track • Objective: • Identify key spans of text for which questions are likely to be generated.

  5. Question Type Determination • Givens: • Source text snippets • The full text • The application track • Objective: • Identify the most likely types of questions to be generated

  6. Question Construction • Application independent • Givens: • Source text snippets • A question type • The full text • Objective: • Construct a natural language question

  7. Evaluating Key Concept Identification • K experts annotate a set of documents • Tag spans of text regarding key concepts • Adjudicate and tag as vital or optional • Instance Recall for each vital snippet • Instance Precision based on all snippets • F-measure • Fully Automatic

  8. Evaluating Question Construction • Compare system question to K expert questions (similar to MT and AS) • Average question F-measure based on facet entailment • Use most similar expert question • Recall: proportion of facets in the expert question entailed by the system question • Precision: proportion of facets in the system question entailed by the expert question

  9. Facet Representation • Original Dependency Parse det vc vmod sbar prd nmod sub vmod vmod pmod sub vmod det det The brass ring would not stick to the nail because the ring is not iron. • Final Semantic Representation theme_not cause_because be_prd_not nmod destination_to_not The brass ring would not stick to the nail because the ring is not iron.

  10. Evaluating Question Construction • Prior work • Analysis of n-gram size effects (Soricut and Brill, 2004) • Dependence evaluation metrics (Owczarzak et al., 2007) • F-measure in similar evaluations (Turian et al., 2003) • N-gram inadequacy in entailment (Perez & Alfonseca, 2005) • Macro-average over nuggets (Lin & Demner-Fushman, 2005) • Facet entailment results (Nielsen et al., 2008)

  11. Summary • QG can be viewed as a 3-step process • Concept Selection • Question Type Determination • Question Construction • Ultimate goal should be very context specific Question Generation • E.g., incorporating a learner model with their goals and a history of interactions

  12. Thanks! • Thanks to Wayne Ward, Steve Bethard, James Martin, Matha Palmer, Philipp Wetzler, the CU Computational Semantics Group and the anonymous reviewers for helpful feedback. • This work was partially funded by Award Numbers: • NSF 0551723, • IES R305B070434, and • NSF DRL-0733323.

  13. Evaluating Question Construction • A Unified Framework for Automatic Evaluation using N-gram Co-Occurrence Statistics (Soricut and Brill, 2004) • MT: 4-grams to ensure fluency • AS: unigrams; little syntactic construction • QG: bigram-level; uses question stems and extraction of key phrases, but more syntactic composition than typical AS

More Related