Searching throughe-Discovery Greg Buckles Julie Wade April 9, 2009
GoToWebinarTips • Click to hide or display the control panel on your screen • Type your question into the Question and Answer Panel. The moderator will notify the presenter of submitted questions
Thank you! Benchmark Legal Solutions 5101 Navigation,Houston, TX 77011 Telephone:713.528.0002 email@example.com
Greg Buckles • Owner of Reason-eD LLC • 19 years of experience in forensics, discovery, record systems and software design to support Fortune 1000 companies. • Started career as forensic criminalist working for the Houston Police Department’s Crime Lab. • Mr. Buckles has worked for industry leaders such as Arnold, White & Durkhee, El Paso Corporation, Symantec Corporation and Attenex Corporation.
Julie Wade • Contract Paralegal with Donovan and Watkins,Marketing Consultant for In2itive Technologies. • Over 25 years experience in the legal profession, having worked on complex litigation cases in state, federal and multi-jurisdictional courts of law. • Received Advanced Certification in Electronic Discovery from Kroll OnTrack, Paralegal Certificate from University of North Texas. • Member of the State Bar of Texas Paralegal Division (Chair District CLE, 2008-09); Women in eDiscovery (Secretary, Houston Chapter, 2008-09); ALSP; AIIM.
Agenda • Information Management Challenges • Court’s View of Search • Search Basics • Types of Search • Search Applications • Tips & Tricks • Resources
Structured and Unstructured Data • Structured data is data that sits in a database and can be mined and searched for information. • Unstructured data consists of emails, word documents, instant messages, blogs, PDFs, videos, and audio recordings – all data that falls outside the traditional database. • Merrill Lynch in 1998 estimated that 80% of all potentially usable business information originated in unstructured form.
Information Management Challenges • Unstructured data resides in different applications, databases, email exchanges and archives. • Unstructured file-type data fastest growing area of all data types. • “not go into business to begin with” – structured or unstructured, how do you search that anyway?
Dealing with unstructured data • Enterprise Content Management systems provide solutions to managing unstructured data content. • Data mining software and other text analytics are used to find patterns in, or otherwise interpret, unstructured information. • Common techniques for structuring text also involve manual tagging with metadata or crawling and indexing the data.
Getting a Handle on File Management • Litigation support and e-discovery are two examples of current applications requiring existing data to be indexed and searched – which is relatively easy to do with structured and semi-structured data, but has proven daunting with unstructured file-based data. • Paralegals must acquire their client’s data maps and interview custodians.
Court’s View of Search • Peskoff v. Faber, 2006 WL 1933483 (D.D.C. July 11, 2006) • United States v. O’Keefe, No. 06-249 (D.D.C. Feb. 18, 2008). • Victor Stanley v. Creative Pipe, Civil Action No. MJG-06-2662 (D. Md. May 29, 2008). • Diabetes Centers of America, Inc. v. Healthpia America, Inc., 2008 WL 336382 (S.D. Tex. Feb. 5, 2008).
Peskoff v. Faber • Defendant asserted computer disks contained “all electronic files and email there were,” but the production did not include 2 years worth of emails received or authored by the plaintiff from 2001 – 2003. • Court ordered defendant to search again and provide a detailed affidavit within 10 days specifying the nature of the search used in locating the responsive emails.
United States v. O’Keefe First opinion to suggest judicial review of alleged search deficiencies requires expert testimony. “Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”
Victor Stanley v. Creative Pipe [A]ll keyword searches are not created equal…. The only prudent way to test the reliability of the keyword search is to perform some appropriate sampling…
Diabetes Centers of America, Inc. v. Healthpia America, Inc. [S]anctions may be appropriate in other cases where evidence is lost because important searches were recklessly entrusted to untrained, unsupervised personnel.
Define the Goal/Results • Inclusion vs. Exclusion, Find vs. Filter • Pinpoint identification of a particular document • Identify privileged documents • Identify responsive materials to discovery requests • Cost constraints & budgetary considerations • Identify who is best positioned to conduct and implement the search (vendor, paralegal)
Realistic Search Goals ESI Collection NO SEARCH IS PERFECT • False Negatives Search Results • False Positives • True Positives
Structure the Search • Plan and structure the search • Identify the scope of data to be searched • Identify who will be performing the search • Identify the technology to be deployed • Identify the processes to be implemented
Execute the Search • All your definitional planning work is now put to the test • Monitor the search as it is being conducted and document the results captured from your search • Document the results of your data hits, data samples, and your other search protocols
Validate the Search • The validation steps you undertake will uphold the veracity of the search methods you deployed • Did the search include all the records that were to be searched • Did you achieve the goals established during the definitional phase?
Report • Attorneys and client depend on the report to assess success and completeness. • Known exceptions and errors must be declared. • Reports are iterative, e.g., the results may require the search to be re-run. • Final process of search is the “Report.”
How Search Works • Build an Index • 10-30% additional storage • Static Copy • Run once – search many • Crawl/Streaming Text • No storage • Dynamic selection
Types of Search Methodologies • Full Text • Boolean – Keywords • Natural Language – hidden risks • Expanded Words • Synonyms, grouping, related words, thesaurus • Concept Clustering – folders v. visual analysis
Keyword Search Normal Parameters • The syntax in the search string • Use of the keywords with or without stemming • Use of keywords with certain wildcard specifications and their syntax • Case-sensitivity of keywords used • Consideration of target data sources
Assumed Parameters • Character coding of the text – UTF-8, UTF-16, CP1252, Unicode/WideChar etc. • Language of the keyword, to select appropriate stemming • Special character sets • Tokenization schemes
Phrase Searches • Double quoting: “smoking gun email” • Noise words: ‘a’, ‘and’, ‘the’, ‘from’, and ‘because’ • Boolean operators in phrases • Wildcard specifications: fail* & spec* • Truncation & Stemming specifications • Fuzzy searches, Booleans, Concept, Latent Semantic Indexing, Text Clustering, Bayesian Classifier, Concept Search Specification
Search Applications • Desktop Search • X1, Google, Isys, dtSearch, WDS, OmniFind • Enterprise Search • IDOL, StoredIQ, Recommind, Kazeon, Symantec • Processing Platforms • Cracker, Discover-e, Extractiva • Review Platforms • Summation, Concordance, Attenex • Forensic Search
Tips and Tricks • Foreign Languages • Exception Handling • Email Address Issues • Partial Non-Indexed File Types/Locations • Term Frequency Lists • Analytics and Sampling
Resources • EDRM Search Guide • Text REtrieval Conference • George L. Paul & Jason R. Baron, Information Inflation: Can the Legal System Adapt?, 13 RICH. J.L. & TECH. 10 (2007) • The Sedona Conference, Best Practices Commentary on the Use of Search and Information Retrieval, 8 THE SEDONA CONF. J. 189 (2008) • Information Organization & Access (IOA) Certificate Program – www.aiim.org