slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
STIR: PowerPoint Presentation
Download Presentation
STIR:

Loading in 2 Seconds...

play fullscreen
1 / 19

STIR: - PowerPoint PPT Presentation


  • 153 Views
  • Uploaded on

STIR:. Simultaneous Achievement of high Precision and high Recall through S ocio- T echnical I nformation R etrieval Robert S. Bauer, Teresa Jade www.H5technologies.com & Mitchell P. Marcus www.cis.upenn.edu/~mitch/. June 7, 2007. The e-Discovery IDEAL: High P with High R.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'STIR:' - aya


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

STIR:

Simultaneous Achievement ofhigh Precision and high Recall throughSocio-Technical Information Retrieval

Robert S. Bauer, Teresa Jadewww.H5technologies.com

&

Mitchell P. Marcus

www.cis.upenn.edu/~mitch/

June 7, 2007

the e discovery ideal high p with high r
The e-Discovery IDEAL: High P with High R
  • Find every relevant document& only those docs that are relevant
  • Desired

P=0.8 (or better)@R=0.8 (or better)

  • Acceptable

P=2/3(or better)@R=2/3(or better)

1

the e discovery reality
The e-Discovery REALITY

High P & Low R= RISK (important docs not retrieved)

TextREtrivalConference

Low P & High R= COST (many more documents must be reviewed)

1

agenda
Agenda
  • Results
    • TREC ad hoc (= typical)
    • Queries typifying Communities of Practice (CoPs)
  • e-Discovery Approaches
    • 5 Dimensions
    • Linguistics of CoPs
  • Research Issues
    • TREC
    • AI
    • Linguists
    • Lawyers

2

typical results ad hoc queries
Typical Results – ad hoc queries

22 Topics

Average

  • Desiredis Rare
  • Acceptable< 10%

(from Chapter 3, “Retrieval System Evaluation” by Chris Buckley and Ellen M. Voorhees, inTREC: Experiment and Evaluation in Information Retrieval, Voorhees & Harman, ed., MIT Press, 2005, p62, Fig. 3.1)

3

accuracy metrics

Ideal

Acceptable

F1 = 2.(P.R)/(P+R)

TREC avg

I II III IV

Accuracy Metrics

compared with STIR topical avg in 4 cases (I-IV) encompassing 42 topics

Most accurate TREC results for 20 of 22 topics in one test case

4

stir compared with trec ir
Average P & R for each caseSTIR compared with TREC IR

STIR

TREC

Precision

Recall

Topical P & R results for one TREC and 4 STIR cases

5

recall improvement
● STIR training provides substantial Recall improvement with acceptable Precision reduction

Retrieval Acceptableto lowest limitof statistical uncertainty

Recall Improvement

Precision

Recall

Sampled Corpus Tests for 12 Topics in case I during STIR Training

5

agenda9
Agenda
  • Results
    • TREC ad hoc (= typical)
    • Queries typifying Communities of Practice (CoPs)
  • e-Discovery Approaches
    • 5 Dimensions
    • Linguistics of CoPs
  • Research Issues
    • TREC
    • AI
    • Linguists
    • Lawyers

6

dimensions of e discovery

Documents

Community

Linguistics

SubjectMatter

LegalCase

Dimensions of e-Discovery

7

dimensions of e discovery document review

Documents

LegalCase

Dimensions of e-Discovery: Document Review

Example Systems:

  • Manual (human) review conducted by attorneys
  • Basic keyword searches targeted to legal issues
  • Supervised learning with relevance feedback

7

dimensions of e discovery expert search

Documents

SubjectMatter

LegalCase

Dimensions of e-Discovery: Expert Search

Example Systems:

  • Subject matter experts reviewresults under legal team direction

● Domain-specificlexicons used

7

dimensions of e discovery model meaning

Documents

Linguistics

SubjectMatter

LegalCase

Dimensions of e-Discovery: Model Meaning

Example Systems:

  • Supervised learning with
    • relevance feedback
    • semantic analysis

● Semantic search

7

dimensions of e discovery model communities

Documents

Community

Linguistics

SubjectMatter

LegalCase

Dimensions of e-Discovery: Model Communities

Example System:

● Socio-Technical-IR

7

dimensions of e discovery s ocio t echnical ir

Community

Linguistics

Dimensions of e-Discovery: Socio-Technical-IR
  • Non-computational Linguistic Disciplines
    • Pragmatics
    • Socio-Linguistics
    • Ethno-Methodology
    • Discourse Analysis
  • A community of practice is
    • a diverse group of people
    • engaged in real work
    • over a significant period of time
    • developing their own tools, language, and processes
    • during which they build things, solve problems, learn and invent
    • evolving a practice that is highly skilled and highly creative

7

agenda16
Agenda
  • Results
    • TREC ad hoc (= typical)
    • Queries typifying Communities of Practice (CoPs)
  • e-Discovery Approaches
    • 5 Dimensions
    • Linguistics of CoPs
  • Research Issues
    • TREC
    • AI
    • Linguists
    • Lawyers

8

research issues
Research Issues
  • TREC
    • Nature of the relatively rare high P with high R queries
    • Measuring both recall and precision effectively
  • AI
    • Knowledge-Based (Expert) Systems that codify linguistic expertise
    • Characterize practice communities of subject matter experts
    • Investigate combination systems applied to different types of topics
  • Linguists
    • Identify and characterize different types of topics and map to system types
    • Language patterns in communities as well as subject matter fields
    • Defining categories in concrete terms
  • Lawyers
    • Defining categories in concrete terms
    • Integration of technology and processes

9

stir analysis cops enunciatory language
STIR Analysis: CoPs’ Enunciatory language

Object

Relevant

Document Text

Process

State of Affairs

Event

Action

Fact