no not that pmi creating search technology for e discovery l.
Skip this Video
Loading SlideShow in 5 Seconds..
No, Not That PMI: Creating Search Technology for E-Discovery PowerPoint Presentation
Download Presentation
No, Not That PMI: Creating Search Technology for E-Discovery

Loading in 2 Seconds...

play fullscreen
1 / 54

No, Not That PMI: Creating Search Technology for E-Discovery - PowerPoint PPT Presentation

  • Uploaded on

No, Not That PMI: Creating Search Technology for E-Discovery. Jason Baron, 1,4 Douglas W. Oard, 1,3 Tamer Elsayed 2,3 and Lidan Wang 2,3 1 College of Information Studies 2 Computer Science Department 3 Institute for Advanced Computer Studies University of Maryland, College Park.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'No, Not That PMI: Creating Search Technology for E-Discovery' - orsen

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
no not that pmi creating search technology for e discovery

No, Not That PMI:Creating Search Technology for E-Discovery

Jason Baron,1,4 Douglas W. Oard,1,3

Tamer Elsayed2,3 and Lidan Wang2,3

1College of Information Studies

2Computer Science Department

3Institute for Advanced Computer Studies

University of Maryland, College Park

Plus thanks to: Simon Attfield, David Lewis, Paul Thompson, Stephen Tomlinson, Feng Zhou

iSchool Colloquium

u s v philip morris et al
U.S. v. Philip Morris et al.
  • Civil lawsuit brought by Clinton Administration against tobacco companies in 1999
  • Racketeering allegation that companies have conspired since 1953 to defraud the American public as to the true health effects of smoking
  • 1,726 Requests to Produce from tobacco companies for tobacco-related records (including email) from 30 federal agencies
  • 32 million Clinton-era email records held by National Archives
query terms






Synar Amendment

Philip Morris

R.J. Reynolds

BAT Industries

Liggett group

Brown and Williamson



(Philip Morris Institute)


(Master Settlement Agreement)


(Environmental Tobacco Smoke)


(Brown & Williamson)

TI (Tobacco Institute)

Query Terms

Round 2

Round 1

suppressing false positives
Suppressing False Positives
  • Upper Marlboro, Maryland
  • Presidential Management Intern (PMI) program
  • Medical Savings Accounts (MSA)
  • Metropolitan Standard Area (MSA)
  • Educational Testing Service (ETS)
  • Black & White photos (B&W)
  • TI . . .

White House Counsel




Smoking Policy Emails

VP Chief of Staff

Ron Klain


Office of the U.S. Trade Rep.

final boolean query
Final Boolean Query


((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR

s. 1415 OR

(ets AND NOT educational testing service) OR

(liggett AND NOT sharon a. liggett) OR

atco OR

lorillard OR

(pmi AND NOT presidential management intern) OR

pm usa OR

rjr OR

(b&w AND NOT photo*) OR

phillip morris OR batco OR ftc test method OR

star scientific OR vector group OR joe camel OR

(marlboro AND NOT upper marlboro)






National Archives

Clinton White House

search request

Tobacco Policy

32 million emails


hired 25 persons

for 6 months …


federal rules of civil procedure as amended 12 1 06
Federal Rules of Civil Procedure (as amended 12/1/06)

Rule 26(f)

At the parties’ planning meeting, issues expected to be discussed include:

  • “Any issues relating to disclosure or discovery of electronically stored information, including the form or forms in which it should be produced”
  • “Any issues relating to preserving discoverable information”
recent case law
Recent Case Law
  • Ameriwood Industries, Inc. v. Liberman, 2007 WL 685623 (E.D. Mo.) (court orders expert report with number of “hits” based on negotiated search terms, with expectation that parties will continue to meet and confer to refine search based on false positives)
  • Williams v. Taser Intern, Inc., 2007 WL 1630875 (N.D. Ga.) (court adjudicates search protocol with keywords plus use of simple Boolean operators)
  • 6/1/07: First published legal opinion in U.S. discussing difference between “keyword” and “concept” searching. Disability Rights Council of Greater Washington, et al. v. Washington Metropolitan Transit Authority, 242 F.R.D. 139 (D.D.C. 2007)
text retrieval conference trec
Text Retrieval Conference (TREC)
  • Goals
    • Foster development of research communities
    • Create “benchmark” evaluation resources
    • Establish “baseline” results
  • History
    • Sponsored by NIST since 1992
    • “Legal Track” started in 2006 with an E-Discovery focus
desiderata in the legal realm
Desiderata in the Legal Realm
  • Two-party
    • Negotiated (not one-sided) information needs
  • Recall-oriented
    • “Smoking gun detection” + completeness
  • Explainable
    • Quantifiable comparison to present best practice
  • Affordable
    • Minimize amount of human review on back end
iit cdip document collection
IIT CDIP Document Collection
  • UCSF Legacy Tobacco Documents Library
    • 6,877,327 documents released in lawsuits
    • Variety of corporate document types
    • Range of printing technologies and handwriting
  • IIT CDIP v1.0 Document Collection
    • OCR: 50 GB
    • Metadata: 5 GB (XML)
  • Scanned documents used for assessment
    • 42 million TIFF page images: 1.5 TB
example document
Example Document




Philip Moxx's. U.S.A. x.dr~am~c. cvrrespoaa.aa

Benffrts Departmext Rieh>pwna, Yfe&ia

Ta: Dishlbutfon Data aday 90,1997.

From: Lisa Fislla

Sabj.csr CIGNA WeWedng Newsbttsr -Yntsre StratsU

During our last CIGNA Aatfoa Plan meadng, tlu iasuo of wLetSae to i0op per'Irw+ng

artieles aod discontinue mndia6 CIGNA Well-Being aawslener to om employees was a

msiter of disanision . I Imvm done somme reaearc>>, and wanted to pruedt you with my

Sadings and pcdiminary recwmmeadatioa for PM's atratezy Ieprding l4aas aewelattee* .

I believe .vayone'a input is valusble, and would epproolate hoarlng fmaa aaeh of you on

whetlne you concur with my reeommendatioa


Organization Authors:PMUSA, PHILIP MORRIS USA

Person Authors:HALLE, L

Document Date:19970530


Bates Number:2078039376/9377

Page Count:2

Collection:Philip Morris

Drafted by The Sedona Conference® lawyers:

Wrongful death and products liability action based on the use of a certain type of radioactive phosphates resulting in contaminated candy as well as in drinking water;

Patent infringement action on a device named “Suck out the Bad, Blow in the Good,” designed to ventilate smoke;

Shareholder class action suit alleging securities fraud and false advertising in connection with a fictional “Smoke Longer, Feel Younger” campaign relying on ‘60s-era folk music;

Fictional Justice Department antitrust investigation looking in to a planned merger and acquisition of a casualty and property insurance company by a tobacco company.



<ComplaintNumber>No. 2006-3</ComplaintNumber>

<Date>July 1, 2006</Date>


<Plaintiff>John Doe, et al.</Plaintiff>

<Defendant>Echinoderm Cigarettes, et al.</Defendant>

<Introduction><P>John Doe, on behalf of the Organization of Concerned Parents, brings this action to force the defendant tobacco companies to cease all placement of tobacco products, brands, and logos in television, film, live theater and rock concerts (collectively referred to as the "public media"). The historical placement of tobacco products and branding in the public media has forced an increase in product awareness, particularly among young adults and children, by providing consistent and recurring exposure to on-screen situations that generally glamorize smoking and other tobacco use.</P> </Introduction>


<PlaintiffParty>Plaintiff John Doe brings this action on behalf of a nationwide class of individuals injured in childhood and adulthood by defendants' actions. Mr. Doe resides at 1004 Public Avenue, Commonwealth of New Searchland.</PlaintiffParty>

<DefendantParty>Defendants are Echinoderm Cigarettes and other unnamed tobacco companies, with principal places of business in the Commonwealth of New Searchland.</DefendantParty>

<Coconspirator></Coconspirator> </Party>

<Jurisdiction> <P>This Court has jurisdiction pursuant to 1 Comm. New Searchland, Sec. 1956.</P> </Jurisdiction>


<P>According to information and belief, Echinoderm Cigarettes and other companies have a long history of placement of tobacco products and brand images in the public media. These media, including television (network and cable), film, a live theater, and rock concerts, are regularly viewed by children, teen-agers, and young adults. Such individuals are at the most impressionable time of their lives, and are unknowingly exposed to de facto advertising for tobacco and tobacco-related products simply by watching such media.</P>

<P>In particular, the glamorous manner in which smoking and other tobacco use are portrayed on the screen adds a cachet to the habit that encourages young people to try smoking for the first time. Thus is exposed the true motivation for product placement - inducing non-smokers to become smokers with blatant disregard for the long term effects and public health risks associated with tobacco use.</P> </Background>


<P>Echinoderm Cigarettes and other unnamed companies have represented that they do not pay for product placement in the public media. This representation is patently false. Tobacco concerns regularly pay for placement of their products via direct monetary compensation, exchange of goods and services, and other considerations.</P>

<P>COUNT I Defendants have engaged in a pattern of misleading practices in violation of state and federal statutes by providing compensation to television networks, production companies, film production companies, providers of live theater and rock concerts in exchange for placement of products and brand images.</P>

<P>COUNT II By exposing children and young adults to tobacco products in the public media, and by glamorizing the use of products known to cause health issues, defendants' actions are in violation of applicable law.</P> </CauseOfAction>


<P>Declare that Echinoderm Cigarettes and other unnamed defendants are in violation of law by providing compensation for placement of products and brand images, and by exposing children and young adults to tobacco products in the media.</P>

<P>Enter an order requiring defendants to disgorge all monies, reimbursement, and payments received as a result of product placement in the public media.</P>

<P>Defendants to pay costs and expenses, including attorneys' fees, in connection with the investigation and litigation of this matter.</P>

</RequestedRelief> </Complaint> </ProductionRequest>

a production request
A “Production Request”

RequestNumber: 52

RequestText: Please produce any and all documents that discuss the use

or introduction of high-phosphate fertilizers (HPF) for the

specific purpose of boosting crop yield in commercial


Proposal: "high-phosphate fertilizer!" AND (boost! w/5 "crop yield")

AND (commercial w/5 agricultur!)

Rejoinder: (phosphat! OR hpf OR phosphorus OR fertiliz!)

AND (yield! OR output OR produc! OR crop OR crops)

FinalQuery: (("high-phosphat! fertiliz!" OR hpf) OR

((phosphat! OR phosphorus) w/15 (fertiliz! OR soil))) AND

(boost! OR increas! OR rais! OR augment! OR affect! OR

effect! OR multipl! OR doubl! OR tripl! OR high! OR greater)

AND (yield! OR output OR produc! OR crop OR crops)

B: 3078

the resulting topic
The Resulting “Topic”

- <ProductionRequest>


<RequestText>Please produce any and all documents that discuss the use or introduction of high-phosphate fertilizers (HPF) for the specific purpose of boosting crop yield in commercial agriculture.</RequestText>

- <BooleanQuery>

<FinalQuery>(("high-phosphat! fertiliz!" OR hpf) OR ((phosphat! OR phosphorus) w/15 (fertiliz! OR soil))) AND (boost! OR increas! OR rais! OR augment! OR affect! OR effect! OR multipl! OR doubl! OR tripl! OR high! OR greater) AND (yield! OR output OR produc! OR crop OR crops)</FinalQuery>

- <NegotiationHistory>

<ProposalByDefendant>"high-phosphate fertilizer!" AND (boost! w/5 "crop yield") AND (commercial w/5 agricultur!)</ProposalByDefendant>

<RejoinderByPlaintiff>(phosphat! OR hpf OR phosphorus OR fertiliz!) AND (yield! OR output OR produc! OR crop OR crops)</RejoinderByPlaintiff>





- <Instruction>

<P>1. These requests require the production of all responsive documents within the sole or joint possession, custody or control of the Defendant, including their agents, departments, attorneys, directors, officers, employees, consultants, investigators, insurance companies, or other persons subject to Defendant's custody or control.</P>

<P>2. All documents that respond, in whole or in part, to any portion of these Requests must be produced in their entirety, including all attachments and enclosures.</P>

<P>3. For purposes of these requests, the words used are considered to have, or should be understood to have their ordinary, everyday meanings. Plaintiffs refer Defendant to any dictionary in the event that Defendant asserts that the wording of a request is vague, ambiguous, unintelligible, or confusing.</P>


- <Definition>

<P>4. The words "and," "or," "each," "any," "all," "refer," and "discuss," shall be construed in their broadest form and the singular shall include the plural and the plural shall include the singular whenever necessary so as to bring within the scope of these Requests all documents (defined below) that might otherwise be construed to be outside their scope.</P>

<P>5. Solely for the purpose of the TREC 2007 legal track, the term "Defendant" shall include the named defendant companies in this complaint as well as all other companies whose records are found in the TREC collection database.</P>

<P>6. Solely for the purpose of the TREC 2007 legal track, "document" means all data, information or writings stored in the TREC legal database, including, without limitation: any written, electronic or computerized files, data or software; memoranda, emails correspondence, OCR scanned images, communications, reports, summaries, studies, analyses, evaluations, notes or notebooks, indices, spreadsheets, logs, books, pamphlets, binders, calendar or diary entries, ledger entries, press clippings, graphs, tables, charts, printouts, drawings, maps, meeting minutes, and transcripts. The term document encompasses all metadata associated with the document. The term also includes all drafts associated with any particular document. The term is also intended to include all electronically stored information as the term is used in the Federal Rules of Civil Procedure,</P>

<P>7. The terms "relating to," "regarding," ‘discussing," or "concerning," shall be synonymous and should be taken to mean in whole or in part constituting, containing, concerning, discussing, describing, analyzing, identifying or stating.</P>

<P>8. The term "high-phosphate fertilizers" (HPF) shall refer to any high phosphate fertilizer, including, but not limited to calcium phosphate fertilizers and superphosphate fertilizers. In some instances, "high-phosphate" fertilizers will be subsumed in the definition of "phosphatic fertlizers." However, phosphatic fertilizers are a more general term for fertilizers containing phosphate and the phosphate concentration of various phosphatic fertilizers is likely to vary.</P>

<P>9. The term "Maleic Hydrazide" (MH) refers to a pesticide that is sprayed on sugar beets for the purpose of decreasing sugar loss in beet roots.</P>


- <Complaint>


<Date>July 1, 2007</Date>


<Plaintiff>MR & MRS. N. EINHERJAR, individually and on behalf of the Estate of DRIFA EINHERJAR, a minor, and the CITY AND COUNTY OF VALHALLA, a government entity.</Plaintiff>

<Defendant>GULLINKAMBI CANDY CO., a Gladsheim corporation; VIKING SUGAR FARMS, a Gladsheim corporation; and U.S. BEET SUGAR ASSOCIATION, a nationwide association with local chapters in Gladsheim.</Defendant>

- <Introduction>

<P>1. Plaintiffs Mr. and Mrs. N. Einherjar bring this action individually and on behalf of the estate of their deceased daughter Drifa Einherjar. These plaintiffs and the City and County of Valhalla (collectively referred to as "Plaintiffs") bring this action against Defendants Gullinkambi Candy Co. (GCC), Viking Sugar Farms (VSF), and the U.S. Beet Sugar Association (BSA) (hereinafter referred to collectively as "Defendants," or individually by their respective acronyms). This complaint seeks equitable and injunctive relief for the use of lethal substances in the production of VSF sugar, resulting in the death of a child and contamination of the Valhalla County groundwater. This complaint additionally seeks damages for strict products liability and failure to warn against GCC for the use of and failure to disclose lethal substances contained in its candy. Finally, this complaint seeks treble and punitive damages for fraud and conspiracy in violation of the Racketeer Influenced and Corrupt Organizations Act (RICO), 18 U.S.C. (sec) 1962 for Defendants' collective and organized concealment of lethal substances from Plaintiffs, resulting in the death of a child and massive contamination of Valhalla County's sole source of drinking water.</P>


- <Party>

<PlaintiffParty>Plaintiffs, Mr. and Mrs. N. Einherjar, are residents of Valhalla, Gladsheim, and their deceased daughter, on whose behalf they are suing, was also a Valhalla resident.</PlaintiffParty>

<DefendantParty>2. Defendants GCC and VSF are both Gladsheim Corporations with principal places of business in Valhalla, Gladsheim. The U.S. Beet Sugar Association has local chapters in Valhalla, Gladsheim, and directs the actions of VSF.</DefendantParty>

<Coconspirator />


- <Jurisdiction>

<P>All events giving rise to this incident took place in Valhalla, Gladsheim. Therefore, jurisdiction of this court is proper.</P>


- <Background>

<P>3. Defendant VSF uses high-phosphate fertilizers (HPF) (sometimes referenced as phosphate fertilizers) to increase the flavor of its sugar beets. HPF contains traces of radioactive elements that remain as a byproduct of phosphate extraction. Phosphate used in HPF is taken from a rock mineral called Apatite which also contains radioactive radium. The resulting Apatite powder therefore contains traces of radioactive elements that become incorporated into HPF. Studies have shown that health problems caused by HPF include immune disorders, toxic myopathy, chronic fatigue syndrome, liver dysfunctions, irregular heart-beat, reactive depression, and memory loss. In addition to using HPF, VSF sprays its sugar beets with Maleic Hydrazide (MH) to decrease the loss of sugar content in its sugar beet crop. MH has been shown to cause renal dysfunction in laboratory mice and to eventually lead to death.</P>

<P>4. In 1933, the U.S. Beet Sugar Association conspired with cane-growers in Hawaii to form a powerful sugar cartel that controlled Congress through a strong sugar lobby. Together, the American sugar growers united to create an underground sugar-trade brotherhood secretly referred to as "The Sugar Program." Members of the brotherhood contributed large sums of money to hire sugar-interest lobbyists who successfully brought about a series of favorable Sugar Acts beginning in 1934 and continuing to the present day. The Sugar Program brotherhood has also been successful in preventing Congress from regulating HPF or MH.</P>

<P>5. For the past five years, the BSA has served as elected leader of The Sugar Program, and has been given the responsibility for regulating the actions of the brotherhood members and for approving all major contracts and actions taken by members under its control.</P>

<P>6. Defendant GCC is a candy company that uses VSF sugar in all of its candy. As part of its contract with VSF, GCC agreed to conceal the levels of HPF and MH contained in VSF sugar from its consumers in exchange for an exclusivity provision and a discount on the wholesale price of its sugar. GCC therefore omitted warnings about HPF and MH from its candy labels.</P>

<P>7. As a result of Defendants' collective actions and omissions an eight-year old girl died from consuming a piece of GCC candy and the Valhalla community as a whole has been harmed by the contamination of their drinking water with HPF and MH.</P>


- <CauseOfAction>


<P>Wrongful Death</P>

<P>8. On March 23, 2007, decedent Drifa Einherjar (hereinafter "Decedent") purchased a piece of GCC candy for $0.67 from the GCC store on Main Street, Valhalla, Gladsheim. At the time of purchase, Decedent was not warned or informed of any dangers of eating the candy and there were no warnings on the candy wrapper or labels of the candy bag.</P>

<P>9. GCC knew that VSF used HPF and MH in its sugar production process. Despite this knowledge, GCC contractually agreed to conceal the presence of HPF and MH in its candy as a condition of its agreement with VSF, in exchange for a discount on its bulk sugar purchases.</P>

<P>10. As a direct and proximate result of these stated acts and omissions, Decedent consumed a piece of GCC candy containing HPF and MH, resulting in her death on March 24, 2007. Decedent ate the candy in a manner in which it was intended to be eaten, and received no instructions from any agents of GCC to exercise caution or to eat the candy in any other way.</P>


<P>Strict Tort Liability</P>

<P>11. The aforementioned candy and VSF sugar used as a primary ingredient in the candy were unreasonably dangerous to human health due to their high content of HPF and MH.</P>

<P>12. Defendants GCC and VSF knew of this health risk and notwithstanding that knowledge, concealed these dangers from the consuming public.</P>

<P>13. As a result of the HPF and MH contained in GCC candy, Decedent died within 24 hours of consuming a single piece of GCC candy.</P>


<P>Public Nuisance (Against Defendant VSF only)</P>

<P>14. Defendant VSF's method of sugar beet farming creates a public nuisance that unreasonably endangers the health of all Valhalla residents by contaminating their groundwater.</P>

<P>15. By continuing to use HPF and MH in its sugar beet production and by failing to use the standard method of limestone quicklime phosphate precipitation in the treatment of its waste-water, VSF continues to contaminate the groundwater and will continue to endanger the health of Valhalla residents. The harm to Valhalla residents will continue until an injunction is issued to stop the use of HPF and MH or to require implementation of the limestone quicklime wastewater treatment to minimize contamination.</P>

<P>16. As a direct and proximate cause of Defendant's acts and omissions, residents of Valhalla have unknowingly ingested harmful substances from their contaminated water supply.</P>


<P>Failure to Warn</P>

<P>17. VSF, as a sugar beet farm that uses HPF and MH, had a duty to issue warnings to Plaintiffs and the general public about the presence of HPF and MH in its sugar and the corresponding health risks that these substances posed in groundwater or direct consumption.</P>

<P>18. Defendants VSF and GCC knew, or with the exercise of reasonable care, should have known that HPF contained radioactive substances and that MH added to the diet of mice, resulted in renal dysfunction and eventual death. Despite this knowledge, no information was offered to the Valhalla Community about the potential hazards of HPF, the lethal nature of MH used in VSF's sugar production, or the presence of HPF or MH in GCC candy.</P>

<P>19. At all times relevant to this litigation, Defendants VSF and GCC had actual and/or constructive knowledge of the dangers mentioned above. Despite this knowledge, VSF continued to operate its sugar beet plant with reckless disregard for the community around it by contaminating their groundwater and GCC continued to sell candy containing HPF and MH in reckless disregard for the life of children whom it targeted in its advertising campaigns and who therefore could be expected to purchase and consume GCC candy.</P>

<P>20. VSF breached its duty to warn the community about HPF and MH groundwater contamination and GCC breached its duty to warn consumers of the HPF and MH in its candy.</P>

<P>21. Defendant VSF's failure to warn has resulted in the contamination of Valhalla County's drinking water and the endangerment of the health of Valhalla residents.</P>

<P>22. GCC's failure to warn resulted in the death of a child and the illness of several others.</P>


<P>Conspiracy and Fraud in Violation of the Racketeer Influenced and Corrupt Organizations Act (RICO), 18 U.S.C. (sec) 1962, and Request for Treble Damages.</P>

<P>23. Defendants VSF, GCC, and BSA engaged in a conspiracy to defraud by collectively agreeing to conceal the presence and adverse health effects of HPF and MH from the American public, the Valhalla community and Plaintiffs in particular.</P>

<P>24. In 1933, Defendants formed a sugar cartel secretly known as "The Sugar Program" which successfully lobbied Congress in passing favorable sugar laws and prevented the regulation of HPF and MH in commercial agriculture.</P>

<P>25. All three Defendants contributed financially to a lobbying fund aimed at fighting HPF and MH regulation and obtaining the passage of favorable "Sugar Acts."</P>

<P>26. For the past five years, the BSA has lead lobbying efforts and approved all actions of The Sugar Program brotherhood.</P>

<P>27. BSA spearheaded the movement to discourage written warnings about HPF and MH, and approved the VSF contract with GCC which provided for a reduction of GCC's wholesale sugar price, and a favorable exclusivity provision between VSF and GCC, under the condition that GCC refrain from publishing warnings about HPF and MH on its product labels.</P>

<P>28. As a result of this collective action to defraud the public, Plaintiffs have suffered injuries indicated above. Treble damages are therefore appropriate under RICO to punish the conspiratorial nature of Defendants' planned concealment of known health risks presented by HPF and MH from the Valhalla community and from Plaintiffs, resulting in the death of a child.</P>



<P>29. Defendant VSF had a duty to the Valhalla community and to Plaintiffs to refrain from contaminating their groundwater and to provide warnings about the known health hazards associated with HPF and MH which it used in the production of its sugar beets.</P>

<P>30. Defendant GCC had a duty to the Valhalla community and to Plaintiffs to disclose the known levels of HPF and MH in VSF sugar which it used as a primary ingredient in its candy.</P>

<P>31. Defendant BSA had a duty to compel members of the brotherhood under its control to require lawful disclosures of HPF and MH.</P>

<P>32. All Defendants breached their respective duties to the Valhalla community and to Plaintiffs. As a result, Plaintiffs have suffered damages indicated above.</P>

<P>Punitive Damages</P>

<P>33. The conduct of Defendants described above is outrageous. Defendants' conduct demonstrates a reckless disregard for human life and a conscious disregard for public safety. The acts and omissions described above were willful and performed with actual or implied malice. Punitive and exemplary damages are therefore appropriate and should be imposed in this instance.</P>


- <RequestedRelief>

<P>WHEREFORE, Plaintiffs respectfully pray for a judgment against Defendants for:</P>

<P>1. Injunctive and equitable relief as the Court deems appropriate including:</P>

<P>i) Requiring Defendant VSF to test and to monitor the water near its sugar plant;</P>

<P>ii) Requiring Defendant VSF to use the quicklime limestone method for processing wastewater to minimize phosphate contamination of Valhalla groundwater, if it is permitted to continue operation of its plant and to continue use of HPF and MH in its sugar beet production;</P>

<P>iii) Compelling Defendant VSF to remove existing HPF from the groundwater by any means necessary; and</P>

<P>2. Compensatory damages to be paid by all Defendants, according to proof at trial;</P>

<P>3. Punitive damages as the court deems appropriate;</P>

<P>4. Costs and attorneys fees of this lawsuit, with interest;</P>

<P>5. Any other relief as the court deems appropriate.</P>




2006 07 research teams
Carnegie Mellon U

Dartmouth College

Long Island U

Sabir Research, Inc.

U Iowa

U Massachusetts

U Maryland

U Missouri, Kansas City

U Washington

Ursinus College

Fudan U (CN)

National U of Singapore (SG)

Open Text Corporation (CA)

U Amsterdam (NL)

U Waterloo (CA)

2006/07 Research Teams
deconstructing concept search
Deconstructing “Concept Search”





representing documents

Count the words

Weight the words

Ascribe meaning to words


Who said this?

When was it said?

Who did they say it to?


What was said about it?


What was done with it?

Representing “Documents”


controlling the search system

“Keyword” query

“Boolean” query

Query by example


Ranked list selection

“More like this” query

Category exploration

Social network exploration


Query refinement

“Search within” query

Controlling the Search System


generating results



Result set

Ranked list




Generating Results





2006 experiments
2006 Experiments
  • 31 “official” runs from 6 sites
    • Judged top-100 main site run, top-10 for others
    • Scored top-5000
  • Reference Boolean run
    • Judged stratified sample of 200 documents
    • Judged to B
  • Expert manual searcher “run”
    • ~100 documents/topic
    • Tried to find documents systems would miss
2006 07 relevancy assessors
Bank of America

Department of Justice

FTI Consulting

H5 Technologies Inc.


Lewis & Roca LLP

New Mexico Attorney General

Preston Gates LLP

Reasonable Discovery LLC


Private individuals (CA, UK)

Law Schools

Boston University

Case Western Reserve

George Mason

George Washington

Loyola-Los Angeles

Loyola-New Orleans

U Dayton

U Indiana-Indianapolis

U Maryland

U Texas

2006/07 “Relevancy” Assessors


2 x



2006 nobody finds everything
2006: Nobody Finds Everything

Source: TREC 2006 Legal Track

2006 precision@r




2006: Precision@R

Manual run for

pool enrichment

Automatic Ranked Runs

trec 2007 experiments
TREC 2007 Experiments
  • Making “pools”: 68 runs from 12 groups
    • Up to 25,000 documents per run per topic
    • Plus 100 random unsubmitted documents
    • Before sampling: 195,688-476,252 docs/topic
  • Bin 1 (“required”)
    • 500 documents. done by 43 of 50 assessors
  • Bins 2 through 6 (optional)
    • 100 documents each
    • 8 of 43 assessors did at least one, 5 did all
estimated of rel docs in pool
Estimated # of Rel Docs in Pool

Mean per Topic:

  • Relevant: 16,904
  • Non-rel.: 298,678
  • Gray: 4,303

Topic 71 (bromhidrosis):

  • Relevant: 77,467

Topic 63 (sugar contract):

  • Relevant: 18
boolean run estimated recall
Boolean Run Estimated Recall

Mean EstR@B: 0.22

  • Boolean run missed 78% of the relevant documents (on average per topic)

Topic 84 (1960’s films)


Topic 77 (smoke NOT tobacco)

EstR@B= 0%

median vs boolean estr@b
Median vs. Boolean (EstR@B)
  • Median won 8 of 43
  • Boolean won 31 of 43
  • (4 tied)

Topic 99: 0.31 vs. 0.21 (natural disasters)

Topic 58: 0.07 vs. 0.94 (phosphates and health)

Boolean run had higher mean EstR@B than all submitted runs.

Median Better

Boolean Better

median vs boolean estr@25000
Median vs. Boolean (EstR@25000)
  • Median won 33 of 43
  • Boolean won 9 of 43
  • (1 tied)

Topic 60: 0.91 vs. 0.07 (phosphate precip.)

Topic 58: 0.09 vs. 0.94 (phosphates and health)

Highest mean EstR@25000 47%

Median Better

Boolean Better

marginal precision by depth band
Marginal Precision by Depth Band

Depths 1- 5000: median Precision=18%

Depths 5001-10000: median Precision=13%

Depths 10001-15000: median Precision=11%

Depths 15001-20000: median Precision=10%

Depths 20001-25000: median Precision=10%

  • 3 of 446 (0.7%) of random (unsubmitted) documents were judged relevant
    • On average, another 50,000 relevant docs per topic?
median run marginal precision depths 20 001 25 000 by topic
Median “Run” Marginal Precision (Depths 20,001-25,000, by Topic)
  • only 6 of 43 topics Marg. Prec. > 10%

Topic 69: MP = 100% (indoor smoke vent.)

Topic 74: MP = 46% (indoor air quality)

Topic 71: MP = 21% (bromhidrosis)

2008 legal track
2008 Legal Track
  • Interactive task models commercial practice
    • Recall-oriented (classify every document)
    • “Topic authority” available for clarification
    • Fewer topics with much richer sampling
  • Relevance feedback task
    • Models multi-stage meet and confer
  • Third set of ad hoc task topics
    • Completes development of reusable(?) collection
hill climbing the boolean set
Hill Climbing the Boolean Set

Ranked Run

on OCR

Boolean run


Ranked Run

on Metadata

Extract “good”

metadata and

add to query

Remove least likely from Boolean

Add most likely from Ranked

metadata based expansion
Metadata-Based Expansion

Document image from archives



How to retrieve corrupted documents?

Expand query with author and recipient names




beating boolean but not by much yet


“Beating Boolean”(but not by much yet!)

TREC-2006/07 “Training Topics” for 2008

incremental disclosure benefit




0.56 0-then-0

Incremental Disclosure Benefit

50 Topics, Title Queries, TREC-2005 Robust Track Collection

other recent developments
Other Recent Developments
  • ICAIL Workshop on Discovery of Electronically Stored Information (DESI), Stanford, June 2007,
  • Sedona Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (August 2007 public draft),
  • DESI-2 Workshop, London, June 2008,
taking the larger view
Taking the Larger View

Jack G. Conrad, “E-Discovery Revisited: A Broader Perspective for Researchers,” DESI-1

e discovery as sensemaking
E-Discovery as Sensemaking

Simon Attfield and Ann Blandford, “E-Discovery Viewed as Integrated Human-Computer Sensemaking,” DESI-2

identity resolution in email
Identity Resolution in Email

Date: Wed Dec 20 08:57:00 EST 2000

From: Kay Mann <>

To: Suzanne Adams <>

Subject: Re: GE Conference Call has be rescheduled

Did Sheila want Scott to participate? Looks like the

call will be too late for him.




Posterior Distribution

3-Step Solution

(1) IdentityModeling

(2) Context Reconstruction

(3) Mention Resolution

where to look for evidence


This Conversation

This Message

Where to Look for Evidence

Socially-related Conversations

Contextual Space

contextual resolution

“Sheila Tweed”


“Sheila Walton”




Contextual Resolution

Elsayed, Oard and Namata

ACL/HLT 2008


“Sheila Tweed”




“Sheila Walton”










test collections
Test Collections





which context is the best
Which Context is the best?





  • Unique test collection
    • 7 million documents with OCR and metadata
    • 83 rich topics (Boolean, free text, context)
    • Recall-oriented evaluation measure
  • Moderately robust research community
    • 16 research teams from 4 countries
    • Attracting attention (and investment) in the law