1 / 43

What does Auto-categorization Have To Do With President Obama's Memorandum on Records Management? (Hint: Everything)

What does Auto-categorization Have To Do With President Obama's Memorandum on Records Management? (Hint: Everything). Auto-Classification: Taking A Closer Look ARMA NOVA Spring 2012 Chapter Seminar Falls Church, VA March 6, 2012 Jason R. Baron Director of Litigation

inara
Download Presentation

What does Auto-categorization Have To Do With President Obama's Memorandum on Records Management? (Hint: Everything)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What does Auto-categorization Have To Do WithPresident Obama's Memorandum on Records Management? (Hint: Everything) Auto-Classification: Taking A Closer Look ARMA NOVA Spring 2012 Chapter Seminar Falls Church, VA March 6, 2012 Jason R. Baron Director of Litigation Office of General Counsel National Archives and Records Administration

  2. A New Era of Government “[P]roper records management is the backbone of open Government.” President Obama’s Memorandum dated November 28, 2011 re “Managing Government Records” http://www.whitehouse.gov/the-press-office/2011/11/28/presidential-memorandum-managing-government-records

  3. Email is still the 800 lb. gorilla of ediscovery (see 36 CFR 1236.22 (2009))

  4. Beyond email: text messaging

  5. Voice mail captured in “Unified Messaging Systems”

  6. Emergence of Web 2.0 Social Media

  7. The Future: Public Records in the Clouds?

  8. Presidential Memorandum • From President Obama’s Memorandum, dated 11/28/11: “Decades of technological advances have transformed agency operations, creating challenges and opportunities for agency records management. Greater reliance on electronic communication and systems has radically increased the volume and diversity of information that agencies must manage. With proper planning, technology can make these records less burdensome to manage and easier to use and share. But if records management policies and practices are not updated for a digital age, the surge in information could overwhelm agency systems, leading to higher costs and lost records.”

  9. Agency Commitments to Records Management Reform 2(a) The head of each agency shall: (i) ensure that the successful implementation of records management requirements in law, regulation, and this memorandum is a priority for senior agency management; (ii) ensure that proper resources are allocated to the effective implementation of such requirements; (iii) within 30 days of the date of this memorandum, designate in writing to the Archivist of the United States (Archivist), a senior agency official to supervise the review required by subsection (b) of this section, in coordination with the agency’s Records Officer, Chief Information Officer, and General Counsel.

  10. Agency Commitments to Records Management Reform 2(b) Within 120 days of the date of this memorandum [i.e., March 2012], each agency head shall submit a report to the Archivist and the Director of the Office of Management and Budget (OMB) that: (i) describes the agency’s current plans for improving or maintaining its records management program, particularly with respect to managing electronic records, including email and social media, deploying cloud-based services or storage solutions, and meeting other records challenges; (ii) identifies any provisions in relevant statutes, regulations, or official NARA guidance that currently pose an obstacle to the agency’s adoption of sound, cost-effective records management policies and practices; and (iii) identifies policies or programs that, if included in the Records Management Directive required by section 3 of this memorandum or adopted or implemented by NARA, would assist the agency’s efforts to improve records management.

  11. Focal Points • creating a Government-wide records management framework that is more efficient and cost-effective; • promoting records management policies and practices that enhance the capability of agencies to fulfill their statutory missions; • maintaining accountability through documentation of agency actions; • increasing open government and appropriate public access to Government records; • supporting agency compliance with applicable legal requirements related to the preservation of information relevant to litigation; and • transitioning from paper-based records management to electronic records management where feasible.

  12. Records Management Directive 3(a) Within 120 days of the deadline for reports submitted pursuant to section 2(b) of this memorandum [i.e., by July 2012] the Director of OMB and the Archivist, in coordination with the Associate Attorney General, shall issue a Records Management Directive that directs agency heads to take specific steps to reform and improve records management policies and practices within their agency.

  13. Records Management Directive 3(b) In the course of developing the directive, the Archivist, in coordination with the Director of OMB and the Associate Attorney General, shall review relevant statutes, regulations, and official NARA guidance to identify opportunities for reforms that would facilitate improved Government-wide records management practices, particularly with respect to electronic records. The Archivist, in coordination with the Director of OMB and the Associate Attorney General, shall present to the President the results of this review, no later than the date of the directive's issuance, to facilitate potential updates to the laws, regulations, and policies governing the management of Federal records.

  14. Process Optimization Problem 1: The transactional toll of user-based recordkeeping schemes (“as is” RM)

  15. …. and the need for better, automated solutions ….

  16. Impact of Technology on E-Records Management: Snapshot 2012 (“As is”) • A universe of proprietary products exists in the marketplace: document management and records management applications (RMAs) • DoD 5015.2 version 3 compliant products • However, scalability issues exist • Agencies must prepare to confront significant front-end process issues when transitioning to electronic recordkeeping • Records schedule simplification is key N

  17. RM wish list for 2012…. • RM’s “easy button”: the elusive goal of zero extra keystrokes to comply with RM requirements (capture) • A technology app that automatically tags records in compliance with RM policies and practices (categorize) • Supervised learning RM with minimal records officer or end user involvement (learn) • Rule-based and role-based RM • Advanced search

  18. Electronic Archiving As The First Step • What is it? 100% snapshot of (typically) email, plus in some cases other selected ESI applications • How does it differ from an RMA? Goal is of preservation of evidence, not records management per se • NARA Bulletin 2008-05

  19. A Possible Path Forward? • Email archiving in short term, synced to existing proprietary software on email system • Designation of key senior officials as creating permanent records, consistent with existing records schedules • Additional designations of permanent records by agency component • “Smart” filters/categorical rules built in based on content, to the extent feasible to do • Default are records in designated temporary record buckets, disposed of under existing records schedules.

  20. A pyramid approach combines disposition policy with automated tools to bring FRA email under records management, preservation, and access = permanent or top officials = temporary or staff and support slider The position of the “set-point” for email capture depends on policy and resources: setting it higher allows use of tools now available to get 100% of email at lower volumes;* setting it lower means more records will be captured and smarter tools are needed to distinguish and disposition temporary- and non-record. Implementing an email archiving policy is feasible now, since tools are readily available to capture 100% of email traffic at the individual or organizational level, in formats that can be archived.

  21. A pyramid approach combines disposition policy with automated tools to bring FRA email under records management, preservation, and access = permanent or top officials = temporary or staff and support slider The position of the “set-point” for email capture depends on policy and resources: setting it higher allows use of tools now available to get 100% of email at lower volumes;* setting it lower means more records will be captured and smarter tools are needed to distinguish and disposition temporary- and non-record. Implementing an email archiving policy is feasible now, since tools are readily available to capture 100% of email traffic at the individual or organizational level, in formats that can be archived.

  22. How To Avoid A Train Wreck With Email Archiving…. Capture E-mail But Utilize Records Management!

  23. Functional Requirements for Categorization Products in the Federal workplace Ease of use …. Scalability …. Archiving in native formats….. Metadata preservation … Seamless integration with existing software apps …. Versioning …. Compatibility with big bucket records schedules …. Advanced search capabilities …. Ease of training / machine learning using records officers or end users …. Cost

  24. Process Optimization Problem 2: The Coming Age of Dark Archives (and the inability to provide access) Summit 2012

  25. Searching the Haystack….

  26. to find relevant needles…

  27. ends up like searching in a maze…

  28. Example of Boolean search string from U.S. v. Philip Morris • (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboro AND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR ("brown & williamson") OR bat industries OR liggett group)

  29. U.S. v. Philip Morris E-mail Winnowing Process • 20 million  200,000  100,000  80,000  20,000 • email hits based relevant produced placed on • records on keyword emails to opposing privilege • terms used party logs • (1%) •  A PROBLEM: only a handful entered as exhibits at trial •  A BIGGER PROGLEM: the 1% figure does not scale

  30. Beyond Keywords: Alternative Search Methods • Greater Use Made of Boolean Strings • Fuzzy Search Models • Probabilistic models (Bayesian) • Statistical methods (clustering) • Machine learning approaches to semantic representation • Categorization tools: taxonomies and ontologies • Social network analysis • Hybrid approaches Reference: Appendix to The Sedona Conference® Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (2007), available at http://www.thesedonaconference.org (link to publications)

  31. Bayesian Statistical Models Based on mathematical models of Statistical Probability to recognize documents of similar content. • Learns passively from the document content • Position, frequency and proximity of terms (language independent) combine to create a mathematical “thumbprint” of concepts contained in documents. • Useful to “cluster” documents by content • Can “learn” to build clusters from exemplar sets • Requires re-indexing and assessment can change

  32. Latent Semantic Indexing (LSI) • SVD (Singular Value Decomposition) assigns each record to a place creating “clusters” z “Query” documents are SVD analyzed and placed in the matrix x “Hits” and rankings are determined by the distance from clusters Vector length = relevance ranking y

  33. Improved review and case assessment: cluster docs thru use of software with minimal human intervention at front end to code “seeded” data set Emerging New Strategies:“Predictive Analytics” Slide adapted from Gartner Conference June 23, 2010 Washington, D.C.

  34. Visual Analysis Examples(Presentation by Dr. Victoria Lemieux, Univ. British Columbia, at Society of American Archivist Annual Mtg. 2010, Washington, D.C.) With acknowledgments to Jeffrey Heer, Exploring Enron, http://hci.stanford.edu/jheer/projects/enron/, Adam Perer, Contrasting Portraits, http://hcil.cs.umd.edu/trs/2006-08/2006-08.pdf, and Fernanda Viegas, Email Conversations, http://fernandaviegas.com/email.html

  35. Social Networking/Links Analysis Example From Marc Smith Posted on Flickr Under Creative Commons License

  36. Judicial second guessing of failure to use e-search capabilities: Capitol Records v. MP3 Tunes, 261 F.R.D. 44 (S.D.N.Y. 2009) • “In [a prior case] the Court notes its dismay that the party opposing discovery of its ESI had organized its files in a manner which seemed to serve no purpose other than ‘to discourage audits. . .’ Similarly, in this case, [the party] host[ed] no ediscovery software on their servers and apparently are unable to conduct centralized email searches of groups of users without downloading them to a separate file and relying on the services of an outside vendor.”

  37. Judicial second guessing of failure to use e-search capabilities: Capitol Records v. MP3 Tunes (con’t) Court went on to add: “The day will undoubtedly will come when burden arguments based on a large organization’s lack of internal ediscovery software will be received about as well as the contention that a party should be spared from retrieving paper documents because it had filed them sequentially, but in no apparent groupings, in an effort to avoid the added expense of file folders or indices.”

  38. Problem 3: Innovative Thinking

  39. References Background Law Review Referencing Autocategorization & Advanced Search Advanced Search J. Baron, “Law in the Age of Exabytes: Some Further Thoughts on ‘Information Inflation’ and Current Issues in E-Discovery Search, 17 Richmond J. Law & Technology (2011), see http://law.richmond.edu Latest “Predictive Coding” Case Law to follow in blogs online: • Da Silva Moore v PublicisGroupe & MSL Group, 11 Civ. 1279 (S.D.N.Y.) (Peck, M.J.) (Opinion dated Feb. 24 2012) • Kleen Products, LLC v. Packaging Corp. of America, 10 C 5711 (N.D. Ill.) (Nolan, M.J.) National Archives and Records Administration

  40. Jason R. Baron Director of Litigation Office of General Counsel National Archives and Records Administration (301) 837-1499 Email: jason.baron@nara.gov

More Related