Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
STIR: PowerPoint Presentation

STIR:

3 Views Download Presentation
Download Presentation

STIR:

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. STIR: Simultaneous Achievement ofhigh Precision and high Recall throughSocio-Technical Information Retrieval Robert S. Bauer, Teresa Jadewww.H5technologies.com & Mitchell P. Marcus www.cis.upenn.edu/~mitch/ June 7, 2007

  2. The e-Discovery IDEAL: High P with High R • Find every relevant document& only those docs that are relevant • Desired P=0.8 (or better)@R=0.8 (or better) • Acceptable P=2/3(or better)@R=2/3(or better) 1

  3. The e-Discovery REALITY High P & Low R= RISK (important docs not retrieved) TextREtrivalConference Low P & High R= COST (many more documents must be reviewed) 1

  4. Agenda • Results • TREC ad hoc (= typical) • Queries typifying Communities of Practice (CoPs) • e-Discovery Approaches • 5 Dimensions • Linguistics of CoPs • Research Issues • TREC • AI • Linguists • Lawyers 2

  5. Typical Results – ad hoc queries 22 Topics Average • Desiredis Rare • Acceptable< 10% (from Chapter 3, “Retrieval System Evaluation” by Chris Buckley and Ellen M. Voorhees, inTREC: Experiment and Evaluation in Information Retrieval, Voorhees & Harman, ed., MIT Press, 2005, p62, Fig. 3.1) 3

  6. Ideal Acceptable F1 = 2.(P.R)/(P+R) TREC avg I II III IV Accuracy Metrics compared with STIR topical avg in 4 cases (I-IV) encompassing 42 topics Most accurate TREC results for 20 of 22 topics in one test case 4

  7. Average P & R for each case STIR compared with TREC IR STIR TREC Precision Recall Topical P & R results for one TREC and 4 STIR cases 5

  8. ● STIR training provides substantial Recall improvement with acceptable Precision reduction Retrieval Acceptableto lowest limitof statistical uncertainty Recall Improvement Precision Recall Sampled Corpus Tests for 12 Topics in case I during STIR Training 5

  9. Agenda • Results • TREC ad hoc (= typical) • Queries typifying Communities of Practice (CoPs) • e-Discovery Approaches • 5 Dimensions • Linguistics of CoPs • Research Issues • TREC • AI • Linguists • Lawyers 6

  10. Documents Community Linguistics SubjectMatter LegalCase Dimensions of e-Discovery 7

  11. Documents LegalCase Dimensions of e-Discovery: Document Review Example Systems: • Manual (human) review conducted by attorneys • Basic keyword searches targeted to legal issues • Supervised learning with relevance feedback 7

  12. Documents SubjectMatter LegalCase Dimensions of e-Discovery: Expert Search Example Systems: • Subject matter experts reviewresults under legal team direction ● Domain-specificlexicons used 7

  13. Documents Linguistics SubjectMatter LegalCase Dimensions of e-Discovery: Model Meaning Example Systems: • Supervised learning with • relevance feedback • semantic analysis ● Semantic search 7

  14. Documents Community Linguistics SubjectMatter LegalCase Dimensions of e-Discovery: Model Communities Example System: ● Socio-Technical-IR 7

  15. Community Linguistics Dimensions of e-Discovery: Socio-Technical-IR • Non-computational Linguistic Disciplines • Pragmatics • Socio-Linguistics • Ethno-Methodology • Discourse Analysis • A community of practice is • a diverse group of people • engaged in real work • over a significant period of time • developing their own tools, language, and processes • during which they build things, solve problems, learn and invent • evolving a practice that is highly skilled and highly creative 7

  16. Agenda • Results • TREC ad hoc (= typical) • Queries typifying Communities of Practice (CoPs) • e-Discovery Approaches • 5 Dimensions • Linguistics of CoPs • Research Issues • TREC • AI • Linguists • Lawyers 8

  17. Research Issues • TREC • Nature of the relatively rare high P with high R queries • Measuring both recall and precision effectively • AI • Knowledge-Based (Expert) Systems that codify linguistic expertise • Characterize practice communities of subject matter experts • Investigate combination systems applied to different types of topics • Linguists • Identify and characterize different types of topics and map to system types • Language patterns in communities as well as subject matter fields • Defining categories in concrete terms • Lawyers • Defining categories in concrete terms • Integration of technology and processes 9

  18. Back-Up

  19. STIR Analysis: CoPs’ Enunciatory language Object Relevant Document Text Process State of Affairs Event Action Fact