1 / 11

Planning for the TREC 2008 Legal Track

Planning for the TREC 2008 Legal Track. Douglas Oard Stephen Tomlinson Jason Baron. Thursday’s Discussion. Deciding on a document collection “Beating Boolean” Handling nasty OCR Making the best use of the metadata Ad hoc task design Interactive task design Relevance feedback task design.

sharis
Download Presentation

Planning for the TREC 2008 Legal Track

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Planning for the TREC 2008Legal Track Douglas Oard Stephen Tomlinson Jason Baron

  2. Thursday’s Discussion • Deciding on a document collection • “Beating Boolean” • Handling nasty OCR • Making the best use of the metadata • Ad hoc task design • Interactive task design • Relevance feedback task design

  3. Choosing a Collection • FERC Enron (w/attachments, full headers) • Email is high-interest for E-discovery practice! • IIT CDIP version 1.0 (same as 2006/07) • Same 83 topics, plus some new ones • State Department Cables • Task: Freedom of Information Act requests

  4. Plans for 2008 • Some things stay the same: • Same collection • Same three tasks (Ad Hoc, RF, Interactive) • Some new things • Deep assessment (fewer new topics) • Additional ranking-sensitive eval measures

  5. Backup Slides

  6. Handling Nasty OCR • Index pruning • Error estimation • Character n-grams • Duplicate detection • Expansion using a cleaner collection

  7. How to “Beat Boolean” • Work from reference Boolean? • Swap out low-ranked-in for high-ranked-out • Relax Boolean somehow? • Cover density, proximity perturbation, …

  8. Using Metadata • Title (term match) • Author (social network • Bates number (sequence)

  9. Ad Hoc Task Design • Evaluation measures • R@B?, P@R?, Index size? • Error bars / Statistical significance testing • Limits on post-hoc use of the collection? • What are “meaningful” differences? • Topic design • Negotiation transcript? • Inter-annotator agreement

  10. Interactive Track Design • Evaluation measure • Precision-oriented? • Recall-oriented? • Effect of assessor disagreement

  11. Relevance Feedback Task • Evaluation measure • Residual recall at B_Residual? • Two-stage feedback?

More Related