Interactive Task of the TREC Legal Track: Theory meets Practice - PowerPoint PPT Presentation

interactive task of the trec legal track theory meets practice n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Interactive Task of the TREC Legal Track: Theory meets Practice PowerPoint Presentation
Download Presentation
Interactive Task of the TREC Legal Track: Theory meets Practice

play fullscreen
1 / 18
Interactive Task of the TREC Legal Track: Theory meets Practice
124 Views
Download Presentation
tyrone
Download Presentation

Interactive Task of the TREC Legal Track: Theory meets Practice

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Interactive Task of theTREC Legal Track:Theory meets Practice Making the world better for lawyers Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies University of Maryland, College Park Joint work with Jason Baron (NARA), Bruce Hedin (H5), Stephen Tomlinson (Open Text)

  2. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ National Archives E-Discovery Clinton White House search request Tobacco Policy 32 million emails 80,000 hired 25 persons for 6 months … 200,000

  3. Federal Rules of Civil Procedure Rule 26(f): At the parties’ planning meeting, issues expected to be discussed include: • “Any issues relating to disclosure or discovery of electronically stored information, including the form or forms in which it should be produced” • “Any issues relating to preserving discoverable information”

  4. “all keyword searches are not created equal; and there is a growing body of literature that highlights the risks associated with conducting an unreliable or inadequate keyword search” Victor Stanley, Inc. v. Creative Pipe, Inc., ---F.Supp.2d---, 2008 WL 2221841, * 3 & n.9 (D. Md. May 29, 2008) Judge Grimm, writing for the U.S. District Court for the District of Maryland

  5. The Design Space “Method” “Features” “Specification” “Result”

  6. What Does “Better” Mean? D “Better” Technique INCREASING SUCCESS (finding relevant documents) “Baseline” Technique A C y B x INCREASING EFFORT (time, resources expended, etc.)

  7. Other Desiderata • Two-party • Negotiated information needs • Comprehensive • “Smoking gun detection” + completeness • Justifiable • Quantifiable comparison to present practice • Affordable • Minimize amount of human review

  8. Text Retrieval Conference (TREC) • Goals • Foster development of research communities • Create “benchmark” evaluation resources • Establish baseline results • History • Sponsored by NIST since 1992 • “Legal Track” started in 2006; E-Discovery focus • Annual evaluation cycle

  9. Evaluation Design Scanned Docs Interactive Task

  10. 2008 Interactive Task Participants 4 research teams submitted 7 runs Each run: YES/NO for all 7 million documents for a single production request Clearwell Systems H5 University at Buffalo University of Pittsburgh

  11. “Complaint” and “Production Request” …12. On January 1, 2002, Echinoderm announced record results for the prior year, primarily attributed to strong demand growth in overseas markets, particularly China, for its products. The announcement also touted the fact that Echinoderm was unique among U.S. tobacco companies in that it had seen no decline in domestic sales during the prior three years. 13. Unbeknownst to shareholders at the time of the January 1, 2002 announcement, defendants had failed to disclose the following facts which they knew at the time, or should have known: a. The Company's success in overseas markets resulted in large part from bribes paid to foreign government officials to gain access to their respective markets; b. The Company knew that this conduct was in violation of the Foreign Corrupt Practices Act and therefore was likely to result in enormous fines and penalties; c. The Company intentionally misrepresented that its success in overseas markets was due to superior marketing. d. Domestic demand for the Company's products was dependent on pervasive and ubiquitous advertising, including outdoor, transit, point of sale and counter top displays of the Company's products, in key markets. Such advertising violated the marketing and advertising restrictions to which the Company was subject as a party to the Attorneys General Master Settlement Agreement ("MSA"). e. The Company knew that it could be ordered at any time to cease and desist from advertising practices that were not in compliance with the MSA and that the inability to continue such practices would likely have a material impact on domestic demand for its products. … • All documents which describe, refer to, report on, or mention any “in-store,” “on-counter,” “point of sale,” or other retail marketing campaigns for cigarettes.

  12. ~7 Million Documents Scanned OCR Metadata Philip Moxx's. U.S.A. x.dr~am~c. cvrrespoaa.aa Benffrts Departmext Rieh>pwna, Yfe&ia Ta: Dishlbutfon Data aday 90,1997. From: Lisa Fislla Sabj.csr CIGNA WeWedng Newsbttsr -Yntsre StratsU During our last CIGNA Aatfoa Plan meadng, tlu iasuo of wLetSae to i0op per'Irw+ng artieles aod discontinue mndia6 CIGNA Well-Being aawslener to om employees was a msiter of disanision . I Imvm done somme reaearc>>, and wanted to pruedt you with my Sadings and pcdiminary recwmmeadatioa for PM's atratezy Ieprding l4aas aewelattee* . I believe .vayone'a input is valusble, and would epproolate hoarlng fmaa aaeh of you on whetlne you concur with my reeommendatioa … Title:CIGNA WELL-BEING NEWSLETTER - FUTURE STRATEGY Organization Authors:PMUSA, PHILIP MORRIS USA Person Authors:HALLE, L Document Date:19970530 Document Type:MEMO, MEMORANDUM Bates Number:2078039376/9377 Page Count:2 Collection:Philip Morris

  13. Relevance Assessment • Volunteer assessors • Mostly from 13 law schools • Web-based assessment system • Based on document images + metadata

  14. Estimating Retrieval Effectiveness

  15. Rel Ret Everyone Gets High Precision Precision RelRet / Ret Recall RelRet / Rel High OCR-accuracy documents only

  16. Interaction Time Effect All documents

  17. Takeaway Messages • Leverage guided interactive refinement • Factor of two in comprehensiveness • Vibrant research community • 22 research teams in 7 countries • Unique test collection • Sampling for “recall-oriented” evaluation

  18. Some Useful References • TREC Legal Track • http://trec-legal.umiacs.umd.edu • Papers at http://trec.nist.gov • Mailing list (contact oard@umd.edu) • DESI-3 Workshop on “Global E-Discovery and E-Disclosure” • June 8, 2009 in Barcelona • http://www.law.pitt.edu/DESI3_Workshop