Planning for the TREC 2008 Legal Track

Planning for the TREC 2008Legal Track Douglas Oard Stephen Tomlinson Jason Baron

Thursday’s Discussion • Deciding on a document collection • “Beating Boolean” • Handling nasty OCR • Making the best use of the metadata • Ad hoc task design • Interactive task design • Relevance feedback task design

Choosing a Collection • FERC Enron (w/attachments, full headers) • Email is high-interest for E-discovery practice! • IIT CDIP version 1.0 (same as 2006/07) • Same 83 topics, plus some new ones • State Department Cables • Task: Freedom of Information Act requests

Plans for 2008 • Some things stay the same: • Same collection • Same three tasks (Ad Hoc, RF, Interactive) • Some new things • Deep assessment (fewer new topics) • Additional ranking-sensitive eval measures

Backup Slides

Handling Nasty OCR • Index pruning • Error estimation • Character n-grams • Duplicate detection • Expansion using a cleaner collection

How to “Beat Boolean” • Work from reference Boolean? • Swap out low-ranked-in for high-ranked-out • Relax Boolean somehow? • Cover density, proximity perturbation, …

Using Metadata • Title (term match) • Author (social network • Bates number (sequence)

Ad Hoc Task Design • Evaluation measures • R@B?, P@R?, Index size? • Error bars / Statistical significance testing • Limits on post-hoc use of the collection? • What are “meaningful” differences? • Topic design • Negotiation transcript? • Inter-annotator agreement

Interactive Track Design • Evaluation measure • Precision-oriented? • Recall-oriented? • Effect of assessor disagreement

Relevance Feedback Task • Evaluation measure • Residual recall at B_Residual? • Two-stage feedback?

Planning for the TREC 2008 Legal Track

Planning for the TREC 2008 Legal Track

Presentation Transcript

TREC 2011 Medical Track

The TREC Conferences trec.nist

Indri at TREC 2004: UMass Terabyte Track Overview

TREC-2006 Legal Track Planning Session

Crowdsourcing Blog Track Top News Judgments at TREC

Legal Framework for BLM Planning

Research on Enterprise Track of TREC 2007

UIC at TREC 2007: Genomics Track

The TREC-9 Adaptive Filtering track

About TREC and TREC Education

UIC at TREC 2006: Blog Track

UIC at TREC 2006: Genomics Track

Planning for the TREC 2008 Legal Track

CRM114 vs. Mr X: Notes for the NIST TREC 2005 Spam Track

Dartmouth Legal Track

Sabir at TREC 2007 Legal Workshop

Interactive Task of the TREC Legal Track: Theory meets Practice

Planning for 2008-09

Experiments with the Negotiated Boolean Queries of the TREC 2007 Legal Discovery Track

Sabir at TREC 2007 Legal Workshop

TREC