Overview of the KBP 2013 Slot Filler Validation Track

Overview of the KBP 2013Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology

Slot Filler Validation (SFV) • Track Goals • Allow teams without a full slot-filling system to participate, focus on answer validation rather than document retrieval • Evaluate the contribution of RTE systems on KBP slot-filling • Allow teams to experiment with system voting and global • SFV input: • Candidate slot filler • Possibly additional information about candidate slot fillers • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Can only improve precision, not recall of full slot-filling systems • Evaluation metrics depends on SFV use case and availability of additional information about candidate fillers • TAC RTE KBP Validation task (2011) • TAC KBP Slot Filler Validation task (2012)

TAC RTE KBP Validation task (2011) Each slot filler returned by SF systems • 1 RTE evaluationpair, where: • T is the entiredocumentsupporting the slot filler • H is a set ofsynonymoussentences, representingdifferentrealizations of the slot filler

Use Case 1: SFV as Textual Entailment (2011) • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance) • Local Approach: • Generic textual entailment: H is relation implied by candidate slot filler (e.g., “Barack Obama has lived in Chicago”), T is provenance (entire document, or smaller regions defined by justification offsets) • Tailored textual entailment: train on different slot types; could be a validation module for a full slot filling system. • Evaluation: • F score on entire pool of candidate slot fillers (unique slot filler, provenance) • Baseline: All T’s classified as entailing the corresponding H: P=R=percentage of entailing pairs in the pooled SF responses • Weak baseline, easily beat by all SFV systems; not a direct measure of utility of SFV to SF

Use Case 2: SFV impact on single SF systems • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • Global Approach: • System Voting, leveraging features across multiple SF runs • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

Slot Filler Validation (SFV) 2012 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • System profile for each SF run • Preliminary assessment of 10% of KBP 2013 Slot Filling queries • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

Slot Filler Validation (SFV) 2012 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • System profile for each SF run • Preliminary assessment of 10% of KBP 2013 Slot Filling queries • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run • One SFV submission, decreased F1 of almost all SF runs except poorest performing SF runs.

Slot Filler Validation (SFV) 2013 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

Slot Filler Validation (SFV) 2013 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • System profile for each SF run • Preliminary assessment of 10% of KBP 2013 Slot Filling queries • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run • Score only on the 90% of KBP 2013 slot filling queries that didn’t have preliminary assessments released as part of SFV input

SF System Profile • SF Team ranks in KBP 2009-2012 • Did the system extract fillers from the KBP 2013 source corpus? • Do the Confidence Values have meaning? • Is the Confidence Value a probability? • Tools or methods for: • Query expansion • Document retrieval • Sentence retrieval • NER nominal tagging • Coreference resolution • Third-party relation/event extraction • Dependency/Constituent parsing • POS tagging • Chunking • Main slot filling algorithm • Learning algorithm • Ensemble model • External resources

Slot Filler Validation Teams and Approaches • BIT: Beijing Institute of Technology [local] • Generic RTE approach based on word overlap, cosine similarity, and token edit distance • Stanford: Stanford University [local] • Based on Stanford’s full slot-filling system, especially component for checking consistency and validity of candidate fillers • UI_CCG: University of Illinois at Urbana-Champaign [local] • Tailored RTE approach; check candidate for slot-specific constraints • jhuapl: Johns Hopkins University Applied Physics Laboratory [weak global] • Consider only the confidence value associated with each candidate filler and aggregate confidence values across systems. • RPI_BLENDER: Rensselaer Polytechnic Institute [strong global] • Based on RPI_BLENDER full slot-filling system (like Stanford), but also leveraged full set of SFV input (including SF system profile and preliminary assessments) to rank systems and apply tier-specific filtering.

Impact of RPI_BLENDER2 SFV on SF Runs Top 10 SF runs Negatively impacted SF runs

Conclusion • Leveraging global features boosts scores of individual SF runs…. If done discriminately • Don’t treat all slot filling systems the same • Even weak global features (e.g. raw confidence values) may help in some cases • Caveat: other evaluation metrics also valid depending on use case. • RTE KBP validation (2011) metric may be appropriate if goal is to make assessment more efficient

Overview of the KBP 2013 Slot Filler Validation Track

Overview of the KBP 2013 Slot Filler Validation Track

Presentation Transcript

TRACK THE VOTE - An Overview -

Pertemuan 13 Weak Slot-and-Filler Structures

New York University 2011 System for KBP Slot Filling

Overview of Ion-Ion validation

Linguistic Resources for the 2012 TAC KBP Slot Filling Evaluations

Slot Filler Validation

Overview of the KBP 2012 Slot-Filling Tasks

Overview of the Multilingual Question Answering Track

35 kbp

31,331 kbp

Overview of TRMM Ground Validation

Overview of the INEX 2008 Efficiency Track

Overview of the Calibration/Validation of

KBP Update

Overview of the INEX 2008 Efficiency Track

Linguistic Resources for the 2013 TAC KBP Slot Filling Evaluations

Overview of the Multilingual Question Answering Track

Overview of the KBP 2013 Slot Filler Validation Track

Overview of the KBP 2012 Slot-Filling Tasks

Slot Car Reifen Und Track Reinigungstipps