1 / 19

RTE Planning Session

RTE Planning Session. Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo. Discussion items. What’s done so far: RTE 1-7 What’s next : what , where , when ? open discussion and audience feedback. Where we have got to. 7 years of RTE challenges

cliftone
Download Presentation

RTE Planning Session

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RTE Planning Session Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo

  2. Discussion items • What’sdone so far: RTE 1-7 • What’snext: what, where, when? open discussion and audience feedback

  3. Where we have got to 7 years of RTE challenges (sponsored by PASCAL – finishing in 2011) • RTE 1-5: balanced data sets based on the output of NLP applications • RTE 6-7: moving toward more realistic scenarios • Main task: TE performed against a real corpus, focused on SUM setting (after experimentation in RTE-5 pilot) • KBP Validation experiment • Considerable amount of datasets have been created in 7 RTE campaigns

  4. What next? • SUM and IE (KBP) have been already investigated in RTE-6 and RTE-7 Proposal: Investigate the potentialities of RTE systems for another NLP application setting

  5. What next? • RTE-8 will not be at TAC 2012 • Co-locate with a major conference to get wider engagement with the NLP community • NIST will continue to support the activities and contribute to the organization of challenges • No RTE-8 in2012 • to allow the shift to an earlier time in the year • to prepare datasets for a new setting

  6. Future directions for RTE:new NLP application scenarios • QAappears to be the mostnaturaldirection • open domain, unsupervisedsetting Possible QA scenarios: AnswerValidation • QA4MRE scenario • QA from Textbooksscenario • AVE on traditional QA tracks data

  7. Answer Validation • deciding whether an answer is correct or not according to a given text • AV as a Textual Entailment problem: • H: question+answer (turned into a declarative sentence) • T is the text supporting the answer • T entails H = the answer is correct according to the supporting text

  8. Answer validation – An Example AV Input: • Question: Which is the capital of Croatia? • Answer: Zagreb  • Text: The capital of Croatia, Zagreb, has a population of around 700,000 citizens and it is known for … RTE Input: 1)T:Text (The capitalof Croatia, Zagreb, has a population...) H: Q + A (Zagrebis the capital of Croatia) => H created manually or with automatic tools 2) Original AV triplet: <T> <Q> <A> => Requires automatic H generation

  9. 1. The QA4MRE scenario • Focuses on the Validation step of the QA pipeline • Formulated as a multiple choice reading comprehension test • Questions about 1 given text • Candidate answers provided • + Reference collection of documents • to allow systems to acquire the same background knowledge • used to assist with answering some questions • End of the roadmap: full QA setting

  10. QA4MRE Reading Test Text Coal seam gas drilling in Australia's Surat Basin has been halted by flooding. Australia's Easternwell, being acquired by Transfield Services, has ceased drilling because of the flooding. The company is drilling coal seam gas wells for Australia's Santos Ltd. Santos said the impact was minimal. Multiple Choice Test According to the text… What company owns wells in Surat Basin? Australia Coal seam gas wells Transfield Services Santos Ltd. Ausam Energy corporation

  11. QA4MRE-based RTE task T(ext) Coal seam gas drilling in Australia's Surat Basin has been halted by flooding. Australia's Easternwell, being acquired by Transfield Services, has ceased drilling because of the flooding. The company is drilling coal seam gas wells for Australia's Santos Ltd. Santos said the impact was minimal. Hs (Q + given A) Australia owns wells in Surat Basin (NO ENTAILMENT) Coal seam gas wells owns wells in Surat Basin (NO ENTAILMENT) Transfield Services owns wells in Surat Basin (NO ENTAILMENT) Santos Ltd. owns wells in Surat Basin (ENTAILMENT) Ausam Energy Corporation owns wells in Surat Basin (NO ENTAILMENT)

  12. QA4MRE-based RTE task • Interesting data: • questions are posed so that various kinds of textual inferences could be requested (lexical, syntactic, discourse ) • Availabledatasets: • QA4MRE@CLEF 2011: up to 600 Hs • 12 readingtests, 120 questions, 600 options • Thetaskwill be proposedalso@CLEF 2012 • When full QA setting => AV of QA4MRE systems

  13. 2. QA from a Textbook (eg., Biology) Textbooks as natural source of Q&A pairs: • T = a paragraph / chapter / book • Hs = revision/test questions from teachers and/or the end of the chapter: • True/false questions • Turn «find-a-value» questions into declarative sentences • A natural and interesting challenge • established task, ready supply of data

  14. QA from a Textbook (cont.) Example (Biology) T(ext) – from Biology textbook …Normally, the genetic material in the nucleus is in a loosely bundled coil called chromatin. At the onset of prophase, chromatin condenses together into a highly ordered structure called a chromosome. Since the genetic material has already been duplicated earlier in S phase, the replicated chromosomes have two sister chromatids, bound together at the centromere by the cohesin protein complex…. Hs Which of the following statement(s) are true? Genetic material is duplicated during prophase (NO ENTAILMENT) During prophase, chromosomes form from chromatin. (ENTAILMENT) S phase follows prophase. (NO ENTAILMENT) Chromatin is a form of genetic material. (ENTAILMENT) Cohesin keep the sister chromatid pairs connected with each other (ENTAILMENT)

  15. 3. AVE on «traditional» QA data • Answer Validation Exercise(Peñas et al., 2006) • Validating the correctness of answers given by QA systems, according to the supporting documents returned by the systems. • Like RTE 6-7 KBP Validation Task Data available from past QA campaigns (TREC & CLEF)

  16. Pilot task: RTE on Specialized Datasets Possiblepilot task usingspecializeddatasets, whereall T-H pairscontainone or more specificphenomenathataffectinference: • Temporalexpressions • Numericalexpressions • Focus on temporal and quantitative reasoning

  17. TE-related initiatives for 2012 • @SemEval 2012: • Task# 6: Semantic Textual Similarity • Task# 8: Cross-Lingual Textual Entailment • @CLEF 2012: • QA4MRE

  18. POSSIBLE VENUES FOR RTE-8 IN 2013 • Semantics conferences are trying to join their efforts: *Sem 2012 • The first joint conference on lexical and computational semantics • Co-located with NAACL-HLT 2012 • PROPOSAL: co-locate RTE-8 with • Siglex Event @ NAACL-HLT or ACL (summer 2013) • IWCS (winter or spring 2013)

  19. Thank you See you all in 2013!

More Related