CLEF Interactive Track Overview

CLEF Interactive Track Overview iCLEF Douglas W. Oard, UMD, USA Julio Gonzalo, UNED, Spain CLEF 2003

Outline • Goals • Track design • Participating teams • Results

CLEF Query Search Ranked List

Query Query Reformulation iCLEF Query Formulation Query Translation Translated Query Search Ranked List Selection Document Examination Document Use

iCLEF Goals • Track design • Component evaluation (since 2001) • End-to-end evaluation (since 2002) • System evaluation • Support for interactive document selection • Support for query creation • Support for iterative refinement

Document Selection Experiments Topic Description Standard Ranked List Interactive Selection F 0.8

End-to-End Experiments Topic Description Query Formulation Automatic Retrieval Interactive Selection Average Precision F 0.8

Topics • Eight “broad” (multifaceted) topics • 1: The Ames espionage case (C100) • 2: European car industry (C106) • 3: Computer security (C109) • 4: Computer animation (C111) • 5: Economic policies of Eduoard Balladur (C120) • 6: Marriage Jackson-Presley (C123) • 7: German armed forces out-of-area (C133) • 8: EU fishing quotas (C139) • Selected from the CLEF 2002 topic set • Not too easy (e.g., proper name not perfectly predictive) • Not too hard (e.g., requiring specialize expertise) • (Relevant documents in every collection?)

Test Collection • Any CLEF-2002 language collection • Systran baseline translations • Spanish to English, English to Spanish • Augmented relevance judgments • Start with CLEF-2002 judgments • Enrich pools with: • Top 20 documents from every iteration • Every document judged by a user • Judge all additions to the pools

Measures of Effectiveness • Query Formulation: Uninterpolated Average Precision • Expected value of precision [over relevant document positions] • Interpreted based on query content at each iteration • Document Selection: Unbalanced F-Measure: • P = precision • R = recall •  = 0.8 favors precision • Models expensive human translation

Variation in Automatic Measures • System • What we seek to measure • Topic • Sample topic space, compute expected value • Topic+System • Pair by topic and compute statistical significance • Collection • Repeat the experiment using several collections

Additional Effects in iCLEF • Learning • Vary topic presentation order • Fatigue • Vary system presentation order • Topic+User (Expertise) • Ask about prior knowledge of each topic

Presentation Order

iCLEF 2003 Research Questions • SICS (Sweden) • What happens when Swedes read English? • Alicante (Spain) • Can NLP-based summaries beat passages? • BBN/Maryland (USA) • Can NLP-based summaries beat passages? • Maryland (USA) • Is user-assisted translation helpful? • UNED (Spain) • Is searching summaries as good as full text? Document Selection End-to-End

2002 Results

? Why? 2003 Results

CLEF Interactive Track Overview

CLEF Interactive Track Overview

Presentation Transcript

The Domain-Specific Track at CLEF 2007

Patent Track @ CLEF

Time Track Overview

Treble Clef

TREBLE CLEF

Treble Clef

Treble clef

SICS @ CLEF 2004 – Interactive Xling

Clef

Treble Clef/G Clef

CLEF 2007 Multilingual Question Answering Track

CLEF-2007 Cross-Language Speech Retrieval Track Overview

Track 3 - Interactive Data Sharing,

CLEF 2004 Overview of Results: Adhoc Tracks

CLEF

The CLEF 2005 Cross-Language Image Retrieval Track

CLEF 2009, Corfu Question Answering Track Overview

CLEF 2008 Multilingual Question Answering Track

The CLEF 2005 interactive track (iCLEF)

Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation